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(57) Abstract: Disclosed herein are methods and compositions for targeted cleavage of a genomic sequence, targeted alteration 
of a genomic sequence, and targeted recombination between a genomic region and an exogenous polynucleotide homologous to 
the genomic region. The compositions include fusion proteins comprising a cleavage domain (or cleavage half-domain) and an 
engineered zinc finger domain and polynucleotides encoding same. Methods for targeted cleavage include introduction of such 
fusion proteins, or polynucleotides encoding same, into a cell. Methods for targeted recombination additionally include introduction 
of an exogenous polynucleotide homologous to a genomic region into cells comprising the disclosed fusion proteins. 
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METHODS AND COMPOSITIONS FOR TARGETED CLEAVAGE 
AND RECOMBINATION 

5 CROSS-REFERENCE TO RELATED APPLICATIONS 

This application claims the benefit of the following U.S. provisional patent 
applications: 60/493,931 filed August 8, 2003; 60/518,253 filed November 7, 2003; 
60/530,541 filed December 18, 2003; 60/542,780 filed February 5, 2004; 60/556,831 
filed March 26, 2004 and 60/575,919 filed June 1, 2004; the disclosures of which are 
10 incorporated by reference in their entireties for all purposes. 

STATEMENT OF RIGHTS TO INVENTIONS 
MADE UNDER FEDERALLY SPONSORED RESEARCH 
Not applicable. 

15 

TECHNICAL FIELD 
The present disclosure is in the field of genome engineering and homologous 
recombination. 

20 BACKGROUND 

A major area of interest in genome biology, especially in light of the 
determination of the complete nucleotide sequences of a number of genomes, is the 
targeted alteration of genome sequences. To provide but one example, sickle cell 
anemia is caused by mutation of a single nucleotide pair in the human P-globin gene. 

25 Thus, the ability to convert the endogenous genomic copy of this mutant nucleotide 
pair to the wild-type sequence in a stable fashion and produce normal P-globin would 
provide a cure for sickle cell anemia. 

Attempts have been made to alter genomic sequences in cultured cells by 
taking advantage of the natural phenomenon of homologous recombination. See, for 

30 example, Capecchi (1989) Science 244:1288-1292; U.S. Patent Nos. 6,528,313 and 
6,528,3 14. If a polynucleotide has sufficient homology to the genomic region 
containing the sequence to be altered, it is possible for part or all of the sequence of 
the polynucleotide to replace the genomic sequence by homologous recombination. 
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However, the frequency of homologous recombination under these circumstances is 
extremely low. Moreover, the frequency of insertion of the exogenous polynucleotide 
at genomic locations that lack sequence homology exceeds the frequency of 
homologous recombination by several orders of magnitude. 
5 The introduction of a double-stranded break into genomic DNA, in the region 

of the genome bearing homology to an exogenous polynucleotide, has been shown to 
stimulate homologous recombination at this site by several thousand-fold in cultured 
cells. Rouet et al (1994) Mol Cell Biol 14:8096-8106; Choulikaer al (1995) Mol 
Cell Biol 15:1968-1973; Donoho etal (1998) Mol Cell Biol 18:4070-4078. See 
10 also Johnson et al (2001) Biochem. Soc. Trans. 29: 196-201 ; and Yanez et al (1998) 
Gene Therapy 5:149-159. In these methods, DNA cleavage in the desired genomic 
region was accomplished by inserting a recognition site for a meganuclease (/.e., an 
endonuclease whose recognition sequence is so large that it does not occur, or occurs 
only rarely, in the genome of interest) into the desired genomic region. 
1 5 However, meganuclease cleavage-stimulated homologous recombination 

relies on either the fortuitous presence of, or the directed insertion of, a suitable 
meganuclease recognition site in the vicinity of the genomic region to be altered. 
Since meganuclease recognition sites are rare (or nonexistent) in a typical mammalian 
genome, and insertion of a suitable meganuclease recognition site is plagued with the 
20 same difficulties as associated with other genomic alterations, these methods are not 
broadly applicable. 

Thus, there remains a need for compositions and methods for targeted 
alteration of sequences in any genome. 

25 SUMMARY 

The present disclosure provides compositions and methods for targeted 
cleavage of cellular chromatin in a region of interest and/or homologous 
recombination at a predetermined region of interest in cells. Cells include cultured 
cells, cells in an organism and cells that have been removed from an organism for 

30 treatment in cases where the cells and/or their descendants will be returned to the 
organism after treatment. A region of interest in cellular chromatin can be, for 
example, a genomic sequence or portion thereof. Compositions include fusion 
polypeptides comprising an engineered zinc finger binding domain (e.g., a zinc finger 
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binding domain having a novel specificity) and a cleavage domain, and fusion 
polypeptides comprising an engineered zinc finger binding domain and a cleavage 
half-domain. Cleavage domains and cleavage half domains can be obtained, for 
example, from various restriction endonucleases and/or homing endonucleases. 
5 Cellular chromatin can be present in any type of cell including, but not limited 

to, prokaryotic and eukaryotic cells, fungal cells, plant cells, animal cells, mammalian 
cells, primate cells and human cells. 

In one aspect, a method for cleavage of cellular chromatin in a region of 
interest (e.g., a method for targeted cleavage of genomic sequences) is provided, the 

10 method comprising: (a) selecting a first sequence in the region of interest; (b) 

engineering a first zinc finger binding domain to bind to the first sequence; and (c) 
expressing a first fusion protein in the cell, the first fiision protein comprising the first 
engineered zinc finger binding domain and a cleavage domain; wherein the first 
fusion protein binds to the first sequence and the cellular chromatin is cleaved in the 

1 5 region of interest. The site of cleavage can be coincident with the sequence to which 
the fusion protein binds, or it can be adjacent (e.g., separated from the near edge of 
the binding site by 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1 1, 12, 13, 14, 15 or more nucleotides). 
A fusion protein can be expressed in a cell, e.g., by delivering the fusion protein to the 
cell or by delivering a polynucleotide encoding the fusion protein to a cell, wherein 

20 the polynucleotide, if DNA, is transcribed, and an KNA molecule delivered to the cell 
or a transcript of a DNA molecule delivered to the cell is translated, to generate the 
fusion protein. Methods for polynucleotide and polypeptide delivery to cells are 
presented elsewhere in this disclosure. 

In certain embodiments, the cleavage domain may comprise two cleavage 

25 half-domains that are covalently linked in the same polypeptide. The two cleavage 
half-domains can be derived from the same endonuclease or from different 
endonucleases. 

In additional embodiments, targeted cleavage of cellular chromatin in a region 
of interest is achieved by expressing two fusion proteins in a cell, each fusion protein 
30 comprising a zinc finger binding domain and a cleavage half-domain. One or both of 
the zinc finger binding domains of the fusion proteins can be engineered to bind to a 
target sequence in the vicinity of the intended cleavage site. If expression of the 
fusion proteins is by polynucleotide delivery, each of the two fusion proteins can be 
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encoded by a separate polynucleotide, or a single polynucleotide can encode both 
fusion proteins. 

Accordingly, a method for cleaving cellular chromatin in a region of interest 
can comprise (a) selecting a first sequence in the region of interest; (b) engineering a 

5 first zinc finger binding domain to bind to the first sequence; (c) expressing a first 
fusion protein in the cell, the first fusion protein comprising the first zinc finger 
binding domain and a first cleavage half-domain; and (d) expressing a second fusion 
protein in the cell, the second fusion protein comprising a second zinc finger binding 
domain and a second cleavage half-domain, wherein the first fusion protein binds to 

10 the first sequence, and the second fusion protein binds to a second sequence located 
between 2 and 50 nucleotides from the first sequence, thereby positioning the 
cleavage half-domains such that the cellular chromatin is cleaved in the region of 
interest. 

In certain embodiments, binding of the first and second fusion proteins 
1 5 positions the cleavage half-domains such that a functional cleavage domain is 
reconstituted. 

In certain embodiments, the second zinc finger binding domain is engineered 
to bind to the second sequence. In further embodiments, the first and second cleavage 
half-domains are derived from the same endonuclease, which can be, for example, a 

20 restriction endonuclease (e.g., a Type IIS restriction endonuclease such as Fok 1) or a 
homing endonuclease. 

In other embodiments, any of the methods described herein may comprise (a) 
selecting first and second sequences in a region of interest, wherein the first and 
second sequences are between 2 and 50 nucleotides apart; (b) engineering a first zinc 

25 finger binding domain to bind to the first sequence; (c) engineering a second zinc 
finger binding domain to bind to the second sequence; (d) expressing a first fusion 
protein in the cell, the first fusion protein comprising the first engineered zinc finger 
binding domain and a first cleavage half-domain; (e) expressing a second fusion 
protein in the cell, the second fusion protein comprising the second engineered zinc 

30 finger binding domain and a second cleavage half-domain; wherein the first fusion 
protein binds to the first sequence and the second fusion protein binds to the second 
sequence, thereby positioning the first and second cleavage half-domains such that the 
cellular chromatin is cleaved in the region of interest. 
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In certain embodiments, the first and second cleavage half-domains are 
derived from the same endonuclease, for example, a Type IIS restriction 
endonuclease, for example, Fok I. In additional embodiments, cellular chromatin is 
cleaved at one or more sites between the first and second sequences to which the 
5 fusion proteins bind. 

In further embodiments, a method for cleavage of cellular chromatin in a 
region of interest comprises (a) selecting the region of interest; (b) engineering a first 
zinc finger binding domain to bind to a first sequence in the region of interest; (c) 
providing a second zinc finger binding domain which binds to a second sequence in 

10 the region of interest, wherein the second sequence is located between 2 and 50 

nucleotides from the first sequence; (d) expressing a first fusion protein in the cell, the 
first fusion protein comprising the first zinc finger binding domain and a first 
cleavage half-domain; and (e) expressing a second fusion protein in the cell, the 
second fusion protein comprising the second zinc finger binding domain and a second 

1 5 cleavage half domain; wherein the first fusion protein binds to the first sequence, and 
the second fusion protein binds to the second sequence, thereby positioning the 
cleavage half-domains such that the cellular chromatin is cleaved in the region of 
interest. 

In any of the methods described herein, the first and second cleavage half- 
20 domains may be derived from the same endonuclease or from different 

endonucleases. In additional embodiments, the second zinc finger binding domain is 
engineered to bind.to the second sequence. 

If one or more polynucleotides encoding the fusion proteins are introduced 
into the cell, an exemplary method for targeted cleavage of cellular chromatin in a 
25 region of interest comprises (a) selecting the region of interest; (b) engineering a first 
zinc finger binding domain to bind to a first sequence in the region of interest; (c) 
providing a second zinc finger binding domain which binds to a second sequence in 
the region of interest, wherein the second sequence is located between 2 and 50 
nucleotides from the first sequence; and (d) contacting a cell with (i) a first 
30 polynucleotide encoding a first fusion protein, the fusion protein comprising the first 
zinc finger binding domain and a first cleavage half-domain, and (ii) a second 
polynucleotide encoding a second fusion protein, the fusion protein comprising the 
second zinc finger binding domain and a second cleavage half domain; wherein the 



5 



WO 2005/014791 



PCTYUS2004/025407 



first and second fusion proteins are expressed, the first fusion protein binds to the first 
sequence and the second fusion protein binds to the second sequence, thereby 
positioning the cleavage half-domains such that the cellular chromatin is cleaved in 
the region of interest. In a variation of this method, a cell is contacted with a single 
5 polynucleotide which encodes both fusion proteins. 

For any of the aforementioned methods, the cellular chromatin can be in a 
chromosome, episome or organellar genome. In addition, in any of the methods 
described herein, at least one zinc finger binding domain is engineered, for example 
by design or selection methods. 

1 0 Similarly, for any of the aforementioned methods, the cleavage half domain 

can be derived from, for example, a homing endonuclease or a restriction 
endonuclease, for example, a Type IIS restriction endonuclease. An exemplary Type 
IIS restriction endonuclease is Fok 1. 

For any of the methods of targeted cleavage, targeted mutagenesis and/or 

15 targeted recombination disclosed herein utilizing fusion proteins comprising a 

cleavage half-domain, the near edges of the binding sites of the fusion proteins can be 
separated by 5 or 6 base pairs. In these embodiments, the binding domain and the 
cleavage domain of the fusion proteins can be separated by a linker of 4 amino acid 
residues. 

20 In certain embodiments, it is possible to obtain increased cleavage specificity 

by utilizing fusion proteins in which one or both cleavage half-domains contains an 
alteration in the amino acid sequence of the dimerization interface. 

Targeted mutagenesis of a region of interest in cellular chromatin can occur 
when a targeted cleavage event, as describe above, is followed by non-homologous 

25 end joining (NHEJ). Accordingly, methods for alteration of a first nucleotide 
sequence in a region of interest in cellular chromatin are provided, wherein the 
methods comprise the steps of (a) engineering a first zinc finger binding domain to 
bind to a second nucleotide sequence in the region of interest, wherein the second 
sequence comprises at least 9 nucleotides; (b) providing a second zinc finger binding 

30 domain to bind to a third nucleotide sequence, wherein the third sequence comprises 
at least 9 nucleotides and is located between 2 and 50 nucleotides from the second 
sequence; (c) expressing a first fusion protein in the cell, the first fusion protein 
comprising the first zinc finger binding domain and a first cleavage half-domain; and 
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(d) expressing a second fusion protein in the cell, the second fusion protein 
comprising the second zinc finger binding domain and a second cleavage half domain; 
wherein the first fusion protein binds to the second sequence, and the second fiision 
protein binds to the third sequence, thereby positioning the cleavage half-domains 
5 such that the cellular chromatin is cleaved in the region of interest and the cleavage 
site is subjected to non-homologous end joining. 

Targeted mutations resulting from the aforementioned method include, but are 
not limited to, point mutations (i.e., conversion of a single base pair to a different base 
pair), substitutions (i.e. 9 conversion of a plurality of base pairs to a different sequence 

1 0 of identical length), insertions or one or more base pairs, deletions of one or more 
base pairs and any combination of the aforementioned sequence alterations. 

Methods for targeted recombination (for, e.g., alteration or replacement of a 
sequence in a chromosome or a region of interest in cellular chromatin) are also 
provided. For example, a mutant genomic sequence can be replaced by a wild-type 

1 5 sequence, e.g., for treatment of genetic disease or inherited disorders. In addition, a 
wild-type genomic sequence can be replaced by a mutant sequence, e.g., to prevent 
function of an oncogene product or a product of a gene involved in an inappropriate 
inflammatory response. Furthermore, one allele of a gene can be replaced by a 
different allele. 

20 In such methods, one or more targeted nucleases create a double-stranded 

break in cellular chromatin at a predetermined site, and a donor polynucleotide, 
having homology to the nucleotide sequence of the cellular chromatin in the region of 
the break, is introduced into the cell. Cellular DNA repair processes are activated by 
the presence of the double-stranded break and the donor polynucleotide is used as a 

25 template for repair of the break, resulting in the introduction of all or part of the 

nucleotide sequence of the donor into the cellular chromatin. Thus a first sequence in 
cellular chromatin can be altered and, in certain embodiments, can be converted into a 
sequence present in a donor polynucleotide. 

In this context, the use of the terms "replace" or "replacement" can be 

30 understood to represent replacement of one nucleotide sequence by another, (i.e., 
replacement of a sequence in the informational sense), and does not necessarily 
require physical or chemical replacement of one polynucleotide by another. 
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Accordingly, in one aspect, a method for replacement of a region of interest in 
cellular chromatin (e.g., a genomic sequence) with a first nucleotide sequence is 
provided, the method comprising: (a) engineering a zinc finger binding domain to 
bind to a second sequence in the region of interest; (b) expressing a fusion protein in a 
5 cell, the fusion protein comprising the zinc finger binding domain and a cleavage 
domain; and (c) contacting the cell with a polynucleotide comprising the first 
nucleotide sequence; wherein the fusion protein binds to the second sequence such 
that the cellular chromatin is cleaved in the region of interest and a nucleotide 
sequence in the region of interest is replaced with the first nucleotide sequence. 
1 0 Generally, cellular chromatin is cleaved in the region of interest at or adjacent to the 
second sequence. In further embodiments, the cleavage domain comprises two 
cleavage half-domains, which can be derived from the same or from different 
nucleases. 

In addition, a method for replacement of a region of interest in cellular 

1 5 chromatin (e.g., a genomic sequence) with a first nucleotide sequence is provided, the 
method comprising: (a) engineering a first zinc finger binding domain to bind to a 
second sequence in the region of interest; (b) providing a second zinc finger binding 
domain to bind to a third sequence in the region of interest; (c) expressing a first 
fusion protein in a cell, the first fusion protein comprising the first zinc finger binding 

20 domain and a first cleavage half-domain; (d) expressing a second fusion protein in the 
cell, the second fusion protein comprising the second zinc finger binding domain and 
a second cleavage half-domain; and (e) contacting the cell with a polynucleotide 
comprising the first nucleotide sequence; wherein the first fusion protein binds to the 
second sequence and the second fusion protein binds to the third sequence, thereby 

25 positioning the cleavage half-domains such that the cellular chromatin is cleaved in 
the region of interest and a nucleotide sequence in the region of interest is replaced 
with the first nucleotide sequence. Generally, cellular chromatin is cleaved in the 
region of interest at a site between the second and third sequences. 

Additional methods for replacement of a region of interest in cellular 

30 chromatin (e.g., a genomic sequence) with a first nucleotide sequence comprise: (a) 
selecting a second sequence, wherein the second sequence is in the region of interest 
and has a length of at least 9 nucleotides; (b) engineering a first zinc finger binding 
domain to bind to the second sequence; (c) selecting a third sequence, wherein the 
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third sequence has a length of at least 9 nucleotides and is located between 2 and 50 
nucleotides from the second sequence; (d) providing a second zinc finger binding 
domain to bind to the third sequence; (e) expressing a first fusion protein in a cell, the 
first fusion protein comprising the first zinc finger binding domain and a first 
5 cleavage half domain; (f) expressing a second fusion protein in the cell, the second 
fusion protein comprising the second zinc finger binding domain and a second 
cleavage half-domain; and (g) contacting the cell with a polynucleotide comprising 
the first nucleotide sequence; wherein the first fusion protein binds to the second 
sequence and the second fusion protein binds to the third sequence, thereby 

1 0 positioning the cleavage half-domains such that the cellular chromatin is cleaved in 
the region of interest and a nucleotide sequence in the region of interest is replaced 
with the first nucleotide sequence. Generally, cellular chromatin is cleaved in the 
region of interest at a site between the second and third sequences. 

In another aspect, methods for targeted recombination are provided in which, a 

1 5 first nucleotide sequence, located in a region of interest in cellular chromatin, is 

replaced with a second nucleotide sequence. The methods comprise (a) engineering a 
first zinc finger binding domain to bind to a third sequence in the region of interest; 
(b) providing a second zinc finger binding domain to bind to a fourth sequence; (c) 
expressing a first fusion protein in a cell, the fusion protein comprising the first zinc 

20 finger binding domain and a first cleavage half-domain; (d) expressing a second 

fusion protein in the cell, the second fusion protein comprising the second zinc finger 
binding domain and a second cleavage half-domain; and (e) contacting a cell with a 
polynucleotide comprising the second nucleotide sequence; wherein the first fusion 
protein binds to the third sequence and the second fusion protein binds to the fourth 

25 sequence, thereby positioning the cleavage half-domains such that the cellular 
chromatin is cleaved in the region of interest and the first nucleotide sequence is 
replaced with the second nucleotide sequence. 

In additional embodiments, a method for alteration of a first nucleotide 
sequence in a region of interest in cellular chromatin is provided, the method 

30 comprising the steps of (a) engineering a first zinc finger binding domain to bind to a 
second nucleotide sequence in the region of interest, wherein the second sequence 
comprises at least 9 nucleotides; (b) providing a second zinc finger binding domain to 
bind to a third nucleotide sequence, wherein the third sequence comprises at least 9 
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nucleotides and is located between 2 and 50 nucleotides from the second sequence; 
(c) expressing a first fusion protein in the cell, the first fusion protein comprising the 
first zinc finger binding domain and a first cleavage half-domain; (d) expressing a 
second fusion protein in the cell, the second fusion protein comprising the second zinc 
5 finger binding domain and a second cleavage half domain; and (e) contacting the cell 
with a polynucleotide comprising a fourth nucleotide sequence, wherein the fourth 
nucleotide sequence is homologous but non-identical with the first nucleotide 
sequence; wherein the first fusion protein binds to the second sequence, and the 
second fusion protein binds to the third sequence, thereby positioning the cleavage 

1 0 half-domains such that the cellular chromatin is cleaved in the region of interest and 
the first nucleotide sequence is altered. In certain embodiments, the first nucleotide 
sequence is converted to the fourth nucleotide sequence. In additional embodiments, 
the second and third nucleotide sequences (i.e., the binding sites for the fusion 
proteins) are present in the polynucleotide comprising the fourth nucleotide sequence 

1 5 (i.e., the donor polynucleotide) and the polynucleotide comprising the fourth 
nucleotide sequence is cleaved. 

In the aforementioned methods for targeted recombination, the binding sites 
for the fusion proteins (i.e., the third and fourth sequences) can comprise any number 
of nucleotides. Preferably, they are at least nine nucleotides in length, but they can 

20 also be larger (e.g., 10, ! 1, 12, 13, 14, 15, 16, 17, 18 and up to 100 nucleotides, 

including any integral value between 9 and 100 nucleotides); moreover the third and 
fourth sequences need not be the same length. The distance between the binding sites 
(i.e., the length of nucleotide sequence between the third and fourth sequences) can be 
any integral number of nucleotide pairs between 2 and 50, (e.g., 5 or 6 base pairs) as 

25 measured from the near end of one binding site to the near end of the other binding 
site. 

In the aforementioned methods for targeted recombination, cellular chromatin 
can be cleaved at a site located between the binding sites of the two fusion proteins. 
In certain embodiments, the binding sites are on opposite DNA strands. Moreover, 
30 expression of the fusion proteins in the cell can be accomplished either by 
introduction of the proteins into the cell or by introduction of one or more 
polynucleotides into the cell, which are optionally transcribed (if the polynucleotide is 
DNA), and the transcript(s) translated, to produce the fusion proteins. For example, 
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two polynucleotides, each comprising sequences encoding one of the two fusion 
proteins, can be introduced into a cell. Alternatively, a single polynucleotide 
comprising sequences encoding both fusion proteins can be introduced into the cell. 

Thus, in one embodiment, a method for replacement of a region of interest in 
5 cellular chromatin (e.g., a genomic sequence) with a first nucleotide sequence 
comprises: (a) engineering a first zinc finger binding domain to bind to a second 
sequence in the region of interest; (b) providing a second zinc finger binding domain 
to bind to a third sequence; and (c) contacting a cell with: 

(i) a first polynucleotide comprising the first nucleotide sequence; 
10 (ii) a second polynucleotide encoding a first fusion protein, the first fusion 

protein comprising the first zinc finger binding domain and a first cleavage half- 
domain; and 

(iii) a third polynucleotide encoding a second fusion protein, the second fusion 
protein comprising the second zinc finger binding domain and a second cleavage half- 
15 domain; 

wherein the first and second fusion proteins are expressed, the first fusion 
protein binds to the second sequence and the second fusion protein binds to the third 
sequence, thereby positioning the cleavage half-domains such that the cellular 
chromatin is cleaved in the region of interest; and the region of interest is replaced 

20 with the first nucleotide sequence. 

In the preferred embodiments of methods for targeted recombination and/or 
replacement and/or alteration of a sequence in a region of interest in cellular 
chromatin, a chromosomal sequence is altered by homologous recombination with an 
exogenous "donor" nucleotide sequence. Such homologous recombination is 

25 stimulated by the presence of a double-stranded break in cellular chromatin, if 

sequences homologous to the region of the break are present. Double-strand breaks in 
cellular chromatin can also stimulate cellular mechanisms of non-homologous end 
joining. 

In any of the methods described herein, the first nucleotide sequence (the 
30 "donor sequence") can contain sequences that are homologous, but not identical, to 
genomic sequences in the region of interest, thereby stimulating homologous 
recombination to insert a non-identical sequence in the region of interest. Thus, in 
certain embodiments, portions of the donor sequence that are homologous to 
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sequences in the region of interest exhibit between about 80 to 99% (or any integer 
therebetween) sequence identity to the genomic sequence that is replaced. In other 
embodiments, the homology between the donor and genomic sequence is higher than 
99%, for example if only 1 nucleotide differs as between donor and genomic 
5 sequences of over 100 contiguous base pairs. In certain cases, a non-homologous 
portion of the donor sequence can contain sequences not present in the region of 
interest, such that new sequences are introduced into the region of interest. In these 
instances, the non-homologous sequence is generally flanked by sequences of 50- 
1,000 base pairs (or any integral value therebetween) or any number of base pairs 
10 greater than 1 ,000, that are homologous or identical to sequences in the region of 
interest. In other embodiments, the donor sequence is non-homologous to the first 
sequence, and is inserted into the genome by non-homologous recombination 
mechanisms. 

In methods for targeted recombination and/or replacement and/or alteration of 

1 5 a sequence of interest in cellular chromatin, the first and second cleavage half- 
domains can be derived from the same endonuclease or from different endonucleases. 
Endonucleases include, but are not limited to, homing endonucleases and restriction 
endonucleases. Exemplary restriction endonucleases are Type IIS restriction 
endonucleases; an exemplary Type IIS restriction endonuclease is Fok I. 

20 The region of interest can be in a chromosome, episome or organellar genome. 

The region of interest can comprise a mutation, which can replaced by a wild type 
sequence (or by a different mutant sequence), or the.region of interest can contain a 
wild-type sequence that is replaced by a mutant sequence or a different allele. 
Mutations include, but are not limited to, point mutations (transitions, transversions), 

25 insertions of one or more nucleotide pairs, deletions of one or more nucleotide pairs, 
rearrangements, inversions and translocations. Mutations can change the coding 
sequence, introduce premature stop codon(s) and/or modify the frequency of a 
repetitive sequence motif (e.g., trinucleotide repeat) in a gene. For applications in 
which targeted recombination is used to replace a mutant sequence, cellular chromatin 

30 is generally cleaved at a site located within 100 nucleotides on either side of the 

mutation, although cleavage sites located up to 6-10 kb from the site of a mutation can 
also be used. 
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In any of the methods described herein, the second zinc finger binding domain 
can be engineered, for example designed and/or selected. 

Further, the donor polynucleotide can be DNA or RNA, can be linear or 
circular, and can be single-stranded or double-stranded. It can be delivered to the cell 
5 as naked nucleic acid, as a complex with one or more delivery agents (e.g., liposomes, 
poloxamers) or contained in a viral delivery vehicle, such as, for example, an 
adenovirus or an adeno-associated Virus (AAV). Donor sequences can range in 
length from 10 to 1,000 nucleotides (or any integral value of nucleotides 
therebetween) or longer. 
1 0 Similarly, polynucleotides encoding fusions between a zinc finger binding 

domain and a cleavage domain or half-domain can be DNA or RNA, can be linear or 
circular, and can be single-stranded or double-stranded. They can be delivered to the 
cell as naked nucleic acid, as a complex with one or more delivery agents (e.g., 
liposomes, poloxamers) or contained in a viral delivery vehicle, such as, for example, 
1 5 an adenovirus or an adeno-associated virus (AAV). A polynucleotide can encode one 
or more fusion proteins. 

In the methods for targeted recombination, as with the methods for targeted 
cleavage, a cleavage domain or half-domain can derived from any nuclease, e.g., a 
homing endonuclease or a restriction endonuclease, in particular, a Type IIS 
20 restriction endonuclease. Cleavage half-domains can derived from the same or from 
different endonucleases. An exemplary source, from which a cleavage half-domain 
can be derived, is the Type IIS restriction endonuclease Fok I. 

In certain embodiments, the frequency of homologous recombination can be 
enhanced by arresting the cells in the G2 phase of the cell cycle and/or by activating 
25 the expression of one or more molecules (protein, RNA) involved in homologous 
recombination and/or by inhibiting the expression or activity of proteins involved in 
non-homologous end-joining. 

BRIEF DESCRIPTION OF THE DRAWINGS 
30 Figure 1 shows the nucleotide sequence, in double-stranded form, of a portion 

of the human hSMClLl gene encoding the amino-terminal portion of the protein 
(SEQ ID NO: I) and the encoded amino acid sequence (SEQ ID NO:2). Target 
sequences for the hSMCl -specific ZFPs are underlined (one on each DNA strand). 
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Figure 2 shows a schematic diagram of a plasmid encoding a ZFP-Fokl fusion 
for targeted cleavage of the hSMCI gene. 

Figure 3 A-D show a schematic diagram of the hSMCI gene. Figure 3A 
shows a schematic of a portion of the human X chromosome which includes the 
5 hSMCI gene. Figure 3B shows a schematic of a portion of the hSMCI gene 

including the upstream region (left of +1), the first exon (between +1 and the right end 
of the arrow labeled "SMC1 coding sequence") and a portion of the first intron. 
Locations of sequences homologous to the initial amplification primers and to the 
chromosome-specific primer (see Table 3) are also provided. Figure 3C shows the 

10 nucleotide sequence of the human X chromosome in the region of the SMC1 initiation 
codon (SEQ ID NO: 3), the encoded amino acid sequence (SEQ ID NO: 4), and the 
target sites for the SMC 1 -specific zinc finger proteins. Figure 3D shows the sequence 
of the corresponding region of the donor molecule (SEQ ID NO: 5), with differences 
between donor and chromosomal sequences underlined. Sequences contained in the 

15 donor-specific amplification primer (Table 3) are indicated by double underlining. 
Figure 4 shows a schematic diagram of the hSMCI donor construct. 
Figure 5 shows PCR analysis of DNA from transfected HEK293 cells. From 
left, the lanes show results from cells transfected with a plasmid encoding GFP 
(control plasmid), cells transfected with two plasmids, each of which encodes one of 

20 the two hSMC 1 -specific ZFP-Fokl fusion proteins (ZFPs only), cells transfected with 
two concentrations of the hSMCI donor plasmid (donor only), and cells transfected 
with the two ZFP-encoding plasmids and the donor plasmid (ZFPs + donor). See 
Example 1 for details. 

Figure 6 shows the nucleotide sequence of an amplification product derived 

25 from a mutated hSMCI gene (SEQ ID NO:6) generated by targeted homologous 
recombination. Sequences derived from the vector into which the amplification 
product was cloned are single-underlined, chromosomal sequences not present in the 
donor molecule are indicated by dashed underlining (nucleotides 32-97), sequences 
common to the donor and the chromosome are not underlined (nucleotides 98-394 and 

30 402-41 7), and sequences unique to the donor are double-underlined (nucleotides 395- 
401). Lower-case letters represent sequences that differ between the chromosome and 
the donor. 
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Figure 7 shows the nucleotide sequence of a portion of the human IL2Ry gene 
comprising the 3' end of the second intron and the 5' end of third exon (SEQ ID 
NO:7) and the amino acid sequence encoded by the displayed portion of the third 
exon (SEQ ID NO:8). Target sequences for the second pair of IL2Ry-specific ZFPs 
5 are underlined. See Example 2 for details. 

Figure 8 shows a schematic diagram of a plasmid encoding a ZFP-Fokl fusion 
for targeted cleavage of IL2Ry gene. 

Figure 9 A-D show a schematic diagram of the IL2Ry gene. Figure 9A shows 
a schematic of a portion of the human X chromosome which includes the IL2Ry gene. 
1 0 Figure 9B shows a schematic of a portion of the IL2Ry gene including a portion of the 
second intron, the third exon and a portion of the third intron. Locations of sequences 
homologous to the initial amplification primers and to the chromosome-specific 
primer (see Table 5) are also provided. Figure 9C shows the nucleotide sequence of 
the human X chromosome in the region of the third exon of the IL2Ry gene (SEQ ID 
15 NO: 9), the encoded amino acid sequence (SEQ ID NO: 10), and the target sites for 
the first pair of IL2Ry-specific zinc finger proteins. Figure 9D shows the sequence of 
the corresponding region of the donor molecule (SEQ ID NO: 1 1), with differences 
between donor and chromosomal sequences underlined. Sequences contained in the 
donor-specific amplification primer (Table 5)are indicated by double overlining. 
20 Figure 10 shows a schematic diagram of the IL2Ry donor construct. 

Figure 11 shows PCR analysis of DNA from transfected K652 cells. From 
left, the lanes show results from cells transfected with two plasmids, each of which 
encodes one of a pair of IL2Ry -specific ZFP-Fokl fusion proteins (ZFPs only, lane 
1), cells transfected with two concentrations of the IL2Ry donor plasmid (donor only, 
25 lanes 2 and 3), and cells transfected with the two ZFP-encoding plasmids and the 
donor plasmid (ZFPs + donor, lanes 4-7). Each of the two pairs of lL2Ry-specific 
ZFP-FoAI fusions were used (identified as "pair 1" and "pair 2") and use of both pairs 
resulted in production of the diagnostic amplification product (labeled "expected 
chimeric product" in the Figure). See Example 2 for details. 
30 Figure 12 shows the nucleotide sequence of an amplification product derived 

from a mutated IL2Ry gene (SEQ ID NO: 12) generated by targeted homologous 
recombination. Sequences derived from the vector into which the amplification 
product was cloned are single-underlined, chromosomal sequences not present in the 



15 



WO 2005/014791 



PCT/US2004/025407 



donor molecule are indicated by dashed underlining (nucleotides 460-552), sequences 
common to the donor and the chromosome are not underlined (nucleotides 32-42 and 
59-459), and a stretch of sequence containing nucleotides which distinguish donor 
sequences from chromosomal sequences is double-underlined (nucleotides 44-58). 
5 Lower-case letters represent nucleotides whose sequence differs between the 
chromosome and the donor. 

Figure 13 shows the nucleotide sequence of a portion of the human beta- 
globin gene encoding segments of the core promoter, the first two exons and the first 
intron (SEQ ID NO: 13). A missense mutation changing an A (in boldface and 
10 underlined) at position 5212541 on Chromosome 1 1 (BLAT, UCSC Genome 
Bioinformatics site) to a T results in sickle cell anemia. A first zinc finger/Foil 
fusion protein was designed such that the primary contacts were with the underlined 
12-nucIeotide sequence AAGGTGAACGTG (nucleotides 305-316 of SEQ ID 
NO: 1 3), and a second zinc finger/Foil fusion protein was designed such that the 
1 5 primary contacts were with the complement of the underlined 12-nucleotide sequence 
CCGTTACTGCCC (nucleotides 325-336 of SEQ ID NO:13). 

Figure 14 is a schematic diagram of a plasmid encoding ZFP-Fokl fusion for 
targeted cleavage of the human beta globin gene. 

Figure 15 is a schematic diagram of the cloned human beta globin gene 
20 showing the upstream region, first and second exons, first intron and primer binding 
sites. 

Figure 16 is a schematic diagram of the beta globin donor construct, pCR4- 
TOPO-HBBdonor. 

Figure 17 shows PCR analysis of DNA from cells transfected with two pairs 
25 of P-globin-speciflc ZFP nucleases and a beta globin donor plasmid. The panel on the 
left is a loading control in which the initial amp 1 and initial amp 2 primers (Table 7) 
were used for amplification. In the experiment shown in the right panel, the 
"chromosome-specific and "donor-specific" primers (Table 7) were used for 
amplification. The leftmost lane in each panel contains molecular weight markers and 
30 the next lane shows amplification products obtained from mock-transfected cells. 

Remaining lanes, from left to right, show amplification product from cells transfected 
with: a GFP-encoding plasmid, lOOng of each ZFP/FoJtl-encoding plasmid, 200ng of 
each ZFP/FoAI-encoding plasmid, 200 ng donor plasmid, 600 ng donor plasmid, 200 
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ng donor plasmid + 100 ng of each ZFP/FoM-encoding plasmid, and 600 ng donor 
plasm id + 200 ng of each ZFP/FoM-encoding plasmid. 

Figure 18 shows the nucleotide sequence of an amplification product derived 
from a mutated beta-globin gene (SEQ ID NO: 14) generated by targeted homologous 

5 recombination. Chromosomal sequences not present in the donor molecule are 

indicated by dashed underlining (nucleotides 1-72), sequences common to the donor 
and the chromosome are not underlined (nucleotides 73-376), and a stretch of 
sequence containing nucleotides which distinguish donor sequences from 
chromosomal sequences is double-underlined (nucleotides 377-408). Lower-case 

10 letters represent nucleotides whose sequence differs between the chromosome and the 
donor. 

Figure 19 shows the nucleotide sequence of a portion of the fifth exon of the 
InterIeukin-2 receptor gamma chain (IL-2Ry) gene (SEQ ID NO: 15). Also shown 
(underlined) are the target sequences for the 5-8 and 5-10 ZFP/Fokl fusion proteins. 
1 5 See Example 5 for details. 

Figure 20 shows the amino acid sequence of the 5-8 ZFP/Fokl fusion targeted 
to exon 5 of the human lL-2Ry gene (SEQ ID NO:16). Amino acid residues 1-17 
contain a nuclear localization sequence (NLS, underlined); residues 18-130 contain 
the ZFP portion, with the recognition regions of the component zinc fingers shown in 
20 boldface; the ZFP-Fok\ linker (ZC linker, underlined) extends from residues 131 to 
140 and the Fokl cleavage half-domain begins at residue 141 and extends to the end 
of the protein at residue 336. The residue that was altered to generate the Q486E 
mutation is shown underlined and in boldface. 

Figure 21 shows the amino acid sequence of the 5-10 ZFP/Fokl fusion 
25 targeted to exon 5 of the human IL-2Ry gene (SEQ ID NO: 17). Amino acid residues 
1-17 contain a nuclear localization sequence (NLS, underlined); residues 18-133 
contain the ZFP portion, with the recognition regions of the component zinc fingers 
shown in boldface; the ZFP-Fo*I linker (ZC linker, underlined) extends from residues 
134 to 143 and the Fokl cleavage half-domain begins at residue 144 and extends to 
30 the end of the protein at residue 339. The residue that was altered to generate the 
E490K mutation is shown underlined and in boldface. 

Figure 22 shows the nucleotide sequence of the enhanced Green Fluorescent 
Protein gene (SEQ ID NO: 18) derived from the Aequorea victoria GFP gene (Tsien 
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(\998) Ann. Rev. Biochem. 67:509-544). The ATG initiation codon, as well as the 
region which was mutagenized, are underlined. 

Figure 23 shows the nucleotide sequence of a mutant defective eGFP gene 
(SEQ ID NO: 19). Binding sites for ZFP-nucleases are underlined and the region 
5 between the binding sites corresponds to the region that was modified. 

Figure 24 shows the structures of plasmids encoding Zinc Finger Nucleases 
targeted to the eGFP gene. 

Figure 25 shows an autoradiogram of a 10% acrylamide gel used to analyze 
targeted DNA cleavage of a mutant eGFP gene by zinc finger endonucleases. See 
1 0 Example 8 for details. 

Figure 26 shows the structure of plasmid pcDNA4/TO/GFPmut (see Example 

9). 

Figure 27 shows levels of eGFPmut mRNA, normalized to GAPDH mRNA, 
in various cell lines obtained from transfection of human HEK293 cells. Light bars 
1 5 show levels in untreated cells; dark bars show levels in cell that had been treated with 
2 ng/ml doxycycline. See Example 9 for details. 

Figure 28 shows the structure of plasmid pCR(R)4-TOPO-GFPdonor5. See 
Example 10 for details. 

Figure 29 shows the nucleotide sequence of the eGFP insert in pCR(R)4- 
20 TOPO-GFPdonor5 (SEQ ID NO:20). The insert contains sequences encoding a 

portion of a non-modified enhanced Green Fluorescent Protein, lacking an initiation 
codon. See Example 10 for details. 

Figure 30 shows a FACS trace of Tl 8 cells transfected with plasmids 
encoding two ZFP nucleases and a plasmid encoding a donor sequence, that were 
25 arrested in the G2 phase of the cell cycle 24 hours post-transfection with 1 00 ng/ml 
nocodazole for 48 hours. The medium was replaced and the cells were allowed to 
recover for an additional 48 hours, and gene correction was measured by FACS 
analysis. See Example 11 for details. 

Figure 31 shows a FACS trace of T18 cells transfected with plasmids 
30 encoding two ZFP nucleases and a plasmid encoding a donor sequence, that were 
arrested in the G2 phase of the cell cycle 24 hours post-transfection with 0.2 uM 
vinblastine for 48 hours. The medium was replaced and the cells were allowed to 
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recover for an additional 48 hours, and gene correction was measured by FACS 
analysis. See Example 1 1 for details. 

Figure 32 shows the nucleotide sequence of a 1,527 nucleotide eGFP insert in 
pCR(R)4-TOPO (SEQ ID NO:21). The sequence encodes a non-modified enhanced 
5 Green Fluorescent Protein lacking an initiation codon. See Example 1 3 for details. 

Figure 33 shows a schematic diagram of an assay used to measure the 
frequency of editing of the endogenous human IL-2Ry gene. See Example 14 for 
details. 

Figure 34 shows autoradiograms of acrylamide gels used in an assay to 

10 measure the frequency of editing of an endogenous cellular gene by targeted cleavage 
and homologous recombination. The lane labeled "GFP" shows assay results from a 
control in which cells were transfected with an eGFP-encoding vector; the lane 
labeled "ZFPs only" shows results from another control experiment in which cells 
were transfected with the two ZFP/nucIease-encoding plasm ids (50 ng of each) but 

15 not with a donor sequence. Lanes labeled "donor only" show results from a control 
experiment in which cells were transfected with 1 ug of donor plasmid but not with 
the ZFP/nuclease-encoding plasmids. In the experimental lanes, 50Z refers to cells 
transfected with 50 ng of each ZFP/nucIease expression plasmid, I00Z refers to cells 
transfected with 100 ng of each ZFP/nuclease expression plasmid, 0.5D refers to cells 

20 transfected with 0.5 \ig of the donor plasmid, and ID refers to cells transfected with 
1 .0 ^ig of the donor plasmid. "+" refers to cells that were exposed to 0.2 
vinblastine; "-" refer to cells that were not exposed to vinblastine, "wt" refers to the 
fragment obtained after BsrBl digestion of amplification products obtained from 
chromosomes containing the wild-type chromosomal IL-2Ry gene; "rflp" refers to the 

25 two fragments (of approximately equal molecular weight) obtained after BsrBl 

digestion of amplification products obtained from chromosomes containing sequences 
from the donor plasmid which had integrated by homologous recombination. 

Figure 35 shows an autoradiographic image of a four-hour exposure of a gel 
used in an assay to measure targeted recombination at the human IL-2Ry locus in 

30 K562 cells, "wt" identifies a band that is diagnostic for chromosomal DNA 

containing the native K562 IL-2Ry sequence; "rflp" identifies a doublet diagnostic 
for chromosomal DNA containing the altered IL-2Ry sequence present in the donor 
DNA molecule. The symbol "+" above a lane indicates that cells were treated with 
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0.2 uM vinblastine; the symbol indicates that cells were not treated with 
vinblastine. The numbers in the "ZFP + donor" lanes indicate the percentage of total 
chromosomal DNA containing sequence originally present in the donor DNA 
molecule, calculated using the "peak finder, automatic baseline" function of 
5 Molecular Dynamics' ImageQuant v. 5. 1 software as described in Ch. 8 of the 
manufacturer's manual (Molecular Dynamics ImageQuant User's Guide; part 21 8- 
415). "Untr" indicates untransfected cells. See Example 15 for additional details. 

Figure 36 shows an autoradiographic image of a four-hour exposure of a gel 
used in an assay to measure targeted recombination at the human lL-2Ry locus in 

10 K562 cells, "wt" identifies a band that is diagnostic for chromosomal DNA 
containing the native K562 IL-2Ry sequence; "rflp" identifies a band that is 
diagnostic for chromosomal DNA containing the altered IL-2Ry sequence present in 
the donor DNA molecule. The symbol above a lane indicates that cells were 
treated with 0.2 uM vinblastine; the symbol "-" indicates that cells were not treated 

1 5 with vinblastine. The numbers beneath the "ZFP + donor" lanes indicate the 

percentage of total chromosomal DNA containing sequence originally present in the 
donor DNA molecule, calculated as described in Example 35. See Example 15 for 
additional details. 

Figure 37 shows an autoradiogram of a four-hour exposure of a DNA blot 
20 probed with a fragment specific to the human IL-2Ry gene. The arrow to the right of 
the image indicates the position of a band corresponding to genomic DNA whose 
sequence has been altered by homologous recombination. The symbol "+" above a 
lane indicates that cells were treated with 0.2 uM vinblastine; the symbol "-" indicates 
that cells were not treated with vinblastine. The numbers beneath the "ZFP + donor" 
25 lanes indicate the percentage of total chromosomal DNA containing sequence 

originally present in the donor DNA molecule, calculated as described in Example 35. 
See Example 15 for additional details. 

Figure 38 shows autoradiographic images of gels used in an assay to measure 
targeted recombination at the human IL-2Ry locus in CD34 + human bone marrow 
30 cells. The left panel shows a reference standard in which the stated percentage of 
normal human genomic DNA (containing a MaeW site) was added to genomic DNA 
from Jurkat cells (lacking a MaeW site), the mixture was amplified by PCR to 
generate a radiolabeled amplification product, and the amplification product was 
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digested with MaeU, "wt" identifies a band representing undigested DNA, and "rflp" 
identifies a band resulting from MaeW digestion. 

The right panel shows results of an experiment in which CD34 + cells were 
transfected with donor DNA containing a BsrBl site and plasmids encoding zinc 
5 f\nger-Fok\ fusion endonucleases. The relevant genomic region was then amplified 
and labeled, and the labeled amplification product was digested with BsrBl. "GFP" 
indicates control cells that were transfected with a GFP-encoding plasmid; "Donor 
only" indicates control cells that were transfected only with donor DNA, and "ZFP + 
Donor" indicates cells that were transfected with donor DNA and with plasmids 

10 encoding the zinc finger/Fokl nucleases, "wt" identifies a band that is diagnostic for 
chromosomal DNA containing the native IL-2Ry sequence; "rflp" identifies a band 
that is diagnostic for chromosomal DNA containing the altered 1L-2R? sequence 
present in the donor DNA molecule. The rightmost lane contains DNA size markers. 
See Example 16 for additional details. 

1 5 Figure 39 shows an image of an immunoblot used to test for Ku70 protein 

levels in cells transfected with Ku70-targeted siRNA. The T7 cell line (Example 9, 
Figure 27) was transfected with two concentrations each of siRNA from two different 
siRNA pools (see Example 1 8). Lane 1 : 70 ng of siRNA pool D; Lane 2: 140 ng of 
siRNA pool D; Lane 3: 70 ng of siRNA pool E; Lane 4: 140 ng of siRNA pool E.. 

20 "Ku70" indicates the band representing the Ku70 protein; "TFIIB" indicates a band 
representing the TF11B transcription factor, used as a control. 

Figure 40 shows the amino acid sequences of four zinc finger domains 
targeted to the human p-globin gene: sca-29b (SEQ ID NO:22); sca-36a (SEQ ID 
NO:23); sca-36b (SEQ ID NO:24) and sca-36c (SEQ ID NO:25). The target site for 

25 the sca-29b domain is on one DNA strand, and the target sites for the sca-36a s sca-36b 
and sca-36c domains are on the opposite strand. See Example 20. 

Figure 41 shows results of an in vitro assay, in which different combinations 
of zinc fmgcr/Fokl fusion nucleases (ZFNs) were tested for sequence-specific DNA 
cleavage. The lane labeled "U" shows a sample of the DNA template. The next four 

30 lanes show results of incubation of the DNA template with each of four P-globin- 
targeted ZFNs (see Example 20 for characterization of these ZFNs). The rightmost 
three lanes show results of incubation of template DNA with the sca-29b ZFN and 
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one of the sca-36a, sca-36b or sca-36c ZFNs (all of which are targeted to the strand 
opposite that to which sca-29b is targeted). 

Figure 42 shows levels of eGFP mRNA in T18 ceils (bars) as a function of 
doxycycline concentration (provided on the abscissa). The number above each bar 
5 represents the percentage correction of the eGFP mutation, in cells transfected with 
donor DNA and plasmids encoding eGFP-targeted zinc finger nucleases, as a function 
of doxycycline concentration. 

DETAILED DESCRIPTION 

10 Disclosed herein are compositions and methods useful for targeted cleavage of 

cellular chromatin and for targeted alteration of a cellular nucleotide sequence, e.g., 
by targeted cleavage followed by non-homotogous end joining or by targeted 
cleavage followed by homologous recombination between an exogenous 
polynucleotide (comprising one or more regions of homology with the cellular 

1 5 nucleotide sequence) and a genomic sequence. Genomic sequences include those 
present in chromosomes, episomes, organellar genomes (e.g., mitochondria, 
chloroplasts), artificial chromosomes and any other type of nucleic acid present in a 
cell such as, for example, amplified sequences, double minute chromosomes and the 
genomes of endogenous or infecting bacteria and viruses. Genomic sequences can be 

20 normal (i.e., wild-type) or mutant; mutant sequences can comprise, for example, 
insertions, deletions, translocations, rearrangements, and/or point mutations. A 
genomic sequence can also comprise one of a number of different alleles. 

Compositions useful for targeted cleavage and recombination include fusion 
proteins comprising a cleavage domain (or a cleavage half-domain) and a zinc finger 

25 binding domain, polynucleotides encoding these proteins and combinations of 
polypeptides and polypeptide-encoding polynucleotides. A zinc finger binding 
domain can comprise one or more zinc fingers (e.g., 2, 3, 4, 5, 6, 7, 8, 9 or more zinc 
fingers), and can be engineered to bind to any genomic sequence. Thus, by 
identifying a target genomic region of interest at which cleavage or recombination is 

30 desired, one can, according to the methods disclosed herein, construct one or more 
fusion proteins comprising a cleavage domain (or cleavage half-domain) and a zinc 
finger domain engineered to recognize a target sequence in said genomic region. The 
presence of such a fusion protein (or proteins) in a cell will result in binding of the 
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fusion protein(s) to its (their) binding site(s) and cleavage within or near said genomic 
region. Moreover, if an exogenous polynucleotide homologous to the genomic region 
is also present in such a cell, homologous recombination occurs at a high rate between 
the genomic region and the exogenous polynucleotide. 

5 

General 

Practice of the methods, as well as preparation and use of the compositions 
disclosed herein employ, unless otherwise indicated, conventional techniques in 
molecular biology, biochemistry, chromatin structure and analysis, computational 

1 0 chemistry, cell culture, recombinant DNA and related fields as are within the skill of 
the art. These techniques are fully explained in the literature. See, for example, 
Sambrook et al. MOLECULAR CLONING: A LABORATORY MANUAL, Second edition, 
Cold Spring Harbor Laboratory Press, 1989 and Third edition, 2001; Ausubel et al, 
CURRENT PROTOCOLS in MOLECULAR BIOLOGY, John Wiley & Sons, New York, 

15 1 987 and periodic updates; the series methods in ENZYMOLOGY, Academic Press, 
San Diego; Wolffe, CHROMATIN STRUCTURE AND FUNCTION, Third edition, 
Academic Press, San Diego, 1998; METHODS in ENZYMOLOGY, Vol. 304, 
"Chromatin" (P.M. Wassarman and A. P. Wolffe, eds.)» Academic Press, San Diego, 
1999; and METHODS IN MOLECULAR BIOLOGY, Vol. 1 19, "Chromatin Protocols" 

20 (P.B. Becker, ed.) Humana Press, Totowa, 1999. 

Definitions 

The terms "nucleic acid," "polynucleotide," and "oligonucleotide" are used 
interchangeably and refer to a deoxyribonucleotide or ribonucleotide polymer, in linear or 

25 circular conformation, and in either single- or double-stranded form. For the purposes of 
the present disclosure, these terms are not to be construed as limiting with respect to the 
length of a polymer. The terms can encompass known analogues of natural nucleotides, as 
well as nucleotides that are modified in the base, sugar and/or phosphate moieties (e.g., 
phosphorothioate backbones). In general, an analogue of a particular nucleotide has the 

30 same base-pairing specificity; i.e., an analogue of A will base-pair with T. 

The terms "polypeptide," "peptide" and "protein" are used interchangeably to refer 
to a polymer of amino acid residues. The term also applies to amino acid polymers in 
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which one or more amino acids are chemical analogues or modified derivatives of a 
corresponding naturally-occurring amino acids. 

"Binding" refers to a sequence-specific, non-covalent interaction between 
macromolecules (e.g., between a protein and a nucleic acid). Not all components of a 
5 binding interaction need be sequence-specific (e.g., contacts with phosphate residues 
in a DNA backbone), as long as the interaction as a whole is sequence-specific. Such 
interactions are generally characterized by a dissociation constant (K^) of 10" 6 M' 1 or 
lower. "Affinity" refers to the strength of binding: increased binding affinity being 
correlated with a lower Kd. 

10 A "binding protein" is a protein that is able to bind non-covalently to another 

molecule. A binding protein can bind to, for example, a DNA molecule (a DNA-binding 
protein), an RNA molecule (an RNA-binding protein) and/or a protein molecule (a 
protein-binding protein). In the case of a protein-binding protein, it can bind to itself (to 
form homodimers, homotrimers, etc.) and/or it can bind to one or more molecules of a 

15 different protein or proteins. A binding protein can have more than one type of binding 
activity. For example, zinc finger proteins have DNA-binding, RNA-binding and protein- 
binding activity. 

A "zinc finger DNA binding protein" (or binding domain) is a protein, or a domain 
within a larger protein, that binds DNA in a sequence-specific manner through one or 

20 more zinc fingers, which are regions of amino acid sequence within the binding domain 
whose structure is stabilized through coordination of a zinc ion. The term zinc finger 
DNA binding protein is often abbreviated as zinc finger protein or ZFP. 

Zinc finger binding domains can be "engineered" to bind to a predetermined 
nucleotide sequence. Non-limiting examples of methods for engineering zinc finger 

25 proteins are design and selection. A designed zinc finger protein is a protein not 
occurring in nature whose design/composition results principally from rational 
criteria. Rational criteria for design include application of substitution rules and 
computerized algorithms for processing information in a database storing information 
of existing ZFP designs and binding data. See, for example, US Patents 6, 140,08 1 ; 

30 6,453,242; and 6,534,261; see also WO 98/53058; WO 98/53059; WO 98/53060; 
WO 02/016536 and WO 03/016496. 

A "selected" zinc finger protein is a protein not found in nature whose production 
results primarily from an empirical process such as phage display, interaction trap or 
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hybrid selection. See e.g., US 5,789,538; US 5,925,523; US 6,007,988; US 6,013,453; 
US 6,200,759; WO 95/19431; WO 96/06166; WO 98/53057; WO 98/54311; 
WO 00/27878; WO 01/60970 WO 01/88197 and WO 02/099084. 

The term "sequence" refers to a nucleotide sequence of any length, which can 
5 be DNA or RNA; can be linear, circular or branched and can be either single-stranded 
or double stranded. The term "donor sequence" refers to a nucleotide sequence that is 
inserted into a genome. A donor sequence can be of any length, for example between 
2 and 10,000 nucleotides in length (or any integer value therebetween or thereabove), 
preferably between about 100 and 1,000 nucleotides in length (or any integer 

10 therebetween), more preferably between about 200 and 500 nucleotides in length. 

A "homologous, non-identical sequence" refers to a first sequence which 
shares a degree of sequence identity with a second sequence, but whose sequence is 
not identical to that of the second sequence. For example, a polynucleotide 
comprising the wild-type sequence of a mutant gene is homologous and non-identical 

15 to the sequence of the mutant gene. In certain embodiments, the degree of homology 
between the two sequences is sufficient to allow homologous recombination 
therebetween, utilizing normal cellular mechanisms. Two homologous non-identical 
sequences can be any length and their degree of non-homology can be as small as a 
single nucleotide (e.g., for correction of a genomic point mutation by targeted 

20 homologous recombination) or as large as 1 0 or more kilobases (e.g., for insertion of 
a gene at a predetermined ectopic site in a chromosome). Two polynucleotides 
comprising the homologous non-identical sequences need not be the same length. For 
example, an exogenous polynucleotide (i.e., donor polynucleotide) of between 20 and 
10,000 nucleotides or nucleotide pairs can be used. 

25 Techniques for determining nucleic acid and amino acid sequence identity are 

known in the art. Typically, such techniques include determining the nucleotide 
sequence of the mRNA for a gene and/or determining the amino acid sequence 
encoded thereby, and comparing these sequences to a second nucleotide or amino acid 
sequence. Genomic sequences can also be determined and compared in this fashion. 

30 In general, identity refers to an exact nucleotide-to-nucleotide or amino acid-to-amino 
acid correspondence of two polynucleotides or polypeptide sequences, respectively. 
Two or more sequences (polynucleotide or amino acid) can be compared by 
determining their percent identity. The percent identity of two sequences, whether 
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nucleic acid or amino acid sequences, is the number of exact matches between two 
aligned sequences divided by the length of the shorter sequences and multiplied by 
100. An approximate alignment for nucleic acid sequences is provided by the local 
homology algorithm of Smith and Waterman, Advances in Applied Mathematics 
5 2:482-489 (1981). This algorithm can be applied to amino acid sequences by using 
the scoring matrix developed by Dayhoff, Atlas of Protein Sequences and Structure. 
M.O. Dayhoff ed., 5 suppl. 3:353-358, National Biomedical Research Foundation, 
Washington, D.C., USA, and normalized by Gribskov, Nucl. Acids Res. 14(6):6745- 
6763 (1986). An exemplary implementation of this algorithm to determine percent 

10 identity of a sequence is provided by the Genetics Computer Group (Madison, WI) in 
the "BestFit" utility application. The default parameters for this method are described 
in the Wisconsin Sequence Analysis Package Program Manual, Version 8 (1995) 
(available from Genetics Computer Group, Madison, WI). A preferred method of 
establishing percent identity in the context of the present disclosure is to use the 

1 5 MPSRCH package of programs copyrighted by the University of Edinburgh, 

developed by John F. Collins and Shane S. Sturrok, and distributed by IntelliGenetics, 
Inc. (Mountain View, CA). From this suite of packages the Smith- Waterman 
algorithm can be employed where default parameters are used for the scoring table 
(for example, gap open penalty of 12, gap extension penalty of one, and a gap of six). 

20 From the data generated the "Match" value reflects sequence identity. Other suitable 
programs for calculating the percent identity or similarity between sequences are 
generally known in the art, for example, another alignment program is BLAST, used 
with default parameters. For example, BLASTN and BLASTP can be used using the 
following default parameters: genetic code = standard; filter = none; strand = both; 

25 cutoff = 60; expect = 10; Matrix = BLOSUM62; Descriptions = 50 sequences; sort by 
- HIGH SCORE; Databases = non-redundant, GenBank + EMBL + DDBJ + PDB + 
GenBank CDS translations + Swiss protein + Spupdate + P1R. Details of these 
programs can be found at the following internet address: 
http://www.ncbi.nlm.gov/cgi-bin/BLAST. With respect to sequences described 

30 herein, the range of desired degrees of sequence identity is approximately 80% to 
1 00% and any integer value therebetween. Typically the percent identities between 
sequences are at least 70-75%, preferably 80-82%, more preferably 85-90%, even 
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more preferably 92%, still more preferably 95%, and most preferably 98% sequence 
identity. 

Alternatively, the degree of sequence similarity between polynucleotides can 
be determined by hybridization of polynucleotides under conditions that allow 
5 formation of stable duplexes between homologous regions, followed by digestion 
with single-stranded-specific nuclease(s), and size determination of the digested 
fragments. Two nucleic acid, or two polypeptide sequences are substantially 
homologous to each other when the sequences exhibit at least about 70%-75%, 
preferably 80%-82%, more preferably 85%-90%, even more preferably 92%, still 

10 more preferably 95%, and most preferably 98% sequence identity over a defined 
length of the molecules, as determined using the methods above. As used herein, 
substantially homologous also refers to sequences showing complete identity to a 
specified DNA or polypeptide sequence. DNA sequences that are substantially 
homologous can be identified in a Southern hybridization experiment under, for 

1 5 example, stringent conditions, as defined for that particular system. Defining 

appropriate hybridization conditions is within the skill of the art. See, e.g., Sambrook 
et al., supra; Nucleic Acid Hybridization: A Practical Approach , editors B.D. Hames 
and S.J. Higgins, ( 1 985) Oxford; Washington, DC; IRL Press). 

Selective hybridization of two nucleic acid fragments can be determined as 

20 follows. The degree of sequence identity between two nucleic acid molecules affects 
the efficiency and strength of hybridization events between such molecules. A 
partially identical nucleic acid sequence will at least partially inhibit the hybridization 
of a completely identical sequence to a target molecule. Inhibition of hybridization of 
the completely identical sequence can be assessed using hybridization assays that are 

25 well known in the art (e.g., Southern (DNA) blot, Northern (RNA) blot, solution 
hybridization, or the like, see Sambrook, et al., Molecular Cloning; A Laboratory 
Manual, Second Edition, (1989) Cold Spring Harbor, N.Y.). Such assays can be 
conducted using varying degrees of selectivity, for example, using conditions varying 
from low to high stringency. If conditions of low stringency are employed, the 

30 absence of non-specific binding can be assessed using a secondary probe that lacks 
even a partial degree of sequence identity (for example, a probe having less than 
about 30% sequence identity with the target molecule), such that, in the absence of 
non-specific binding events, the secondary probe will not hybridize to the target 

27 



WO 2005/014791 



PCT/US2004/025407 



When utilizing a hybridization-based detection system, a nucleic acid probe is 
chosen that is complementary to a reference nucleic acid sequence, and then by 
selection of appropriate conditions the probe and the reference sequence selectively 
hybridize, or bind, to each other to form a duplex molecule. A nucleic acid molecule 
5 that is capable of hybridizing selectively to a reference sequence under moderately 
stringent hybridization conditions typically hybridizes under conditions that allow 
detection of a target nucleic acid sequence of at least about 10-14 nucleotides in 
length having at least approximately 70% sequence identity with the sequence of the 
selected nucleic acid probe. Stringent hybridization conditions typically allow 

10 detection of target nucleic acid sequences of at least about 10-14 nucleotides in length 
having a sequence identity of greater than about 90-95% with the sequence of the 
selected nucleic acid probe. Hybridization conditions useful for probe/reference 
sequence hybridization, where the probe and reference sequence have a specific 
degree of sequence identity, can be determined as is known in the art (see, for 

1 5 example, Nucleic Acid Hybridization: A Practical Approach, editors B.D. Hames and 
S.J. Higgins, (1985) Oxford; Washington, DC; TRL Press). 

Conditions for hybridization are well-known to those of skill in the art. 
Hybridization stringency refers to the degree to which hybridization conditions 
disfavor the formation of hybrids containing mismatched nucleotides, with higher 

20 stringency correlated with a lower tolerance for mismatched hybrids. Factors that 
affect the stringency of hybridization are well-known to those of skill in the art and 
include, but are not limited to, temperature, pH, ionic strength, and concentration of 
organic solvents such as, for example, formamide and dimethylsulfoxide. As is 
known to those of skill in the art, hybridization stringency is increased by higher 

25 temperatures, lower ionic strength and lower solvent concentrations. 

With respect to stringency conditions for hybridization, it is well known in the 
art that numerous equivalent conditions can be employed to establish a particular 
stringency by varying, for example, the following factors: the length and nature of the 
sequences, base composition of the various sequences, concentrations of salts and 

30 other hybridization solution components, the presence or absence of blocking agents 
in the hybridization solutions (e.g., dextran sulfate, and polyethylene glycol), 
hybridization reaction temperature and time parameters, as well as, varying wash 
conditions. The selection of a particular set of hybridization conditions is selected 
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following standard methods in the art (see, for example, Sambrook, et a!., Molecular 
Cloning: A Laboratory Manual Second Edition, (1989) Cold Spring Harbor, N.Y.). 

"Recombination" refers to a process of exchange of genetic information 
between two polynucleotides. For the purposes of this disclosure, "homologous 
5 recombination (HR)" refers to the specialized form of such exchange that takes place, 
for example, during repair of double-strand breaks in cells. This process requires 
nucleotide sequence homology, uses a "donor" molecule to template repair of a 
"target" molecule (/.e., the one that experienced the double-strand break), and is 
variously known as "non-crossover gene conversion" or "short tract gene conversion," 

10 because it leads to the transfer of genetic information from the donor to the target. 
Without wishing to be bound by any particular theory, such transfer can involve 
mismatch correction of heteroduplex DNA that forms between the broken target and 
the donor, and/or "synthesis-dependent strand annealing," in which the donor is used 
to resynthesize genetic information that will become part of the target, and/or related 

1 5 processes. Such specialized HR often results in an alteration of the sequence of the 
target molecule such that part or all of the sequence of the donor polynucleotide is 
incorporated into the target polynucleotide. 

"Cleavage" refers to the breakage of the covalent backbone of a DNA 
molecule. Cleavage can be initiated by a variety of methods including, but not limited 

20 to, enzymatic or chemical hydrolysis of a phosphodiester bond. Both single-stranded 
cleavage and double-stranded cleavage are possible, and double-stranded cleavage 
can occur as a result of two distinct single-stranded cleavage events. DNA cleavage 
can result in the production of either blunt ends or staggered ends. In certain 
embodiments, fusion polypeptides are used for targeted double-stranded DNA 

25 cleavage. 

A "cleavage domain" comprises one or more polypeptide sequences which 
possesses catalytic activity for DNA cleavage. A cleavage domain can be contained 
in a single polypeptide chain or cleavage activity can result from the association of 
two (or more) polypeptides. 
30 A "cleavage half-domain" is a polypeptide sequence which, in conjunction 

with a second polypeptide (either identical or different) forms a complex having 
cleavage activity (preferably double-strand cleavage activity). 
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"Chromatin" is the nucleoprotein structure comprising the cellular genome. 
Cellular chromatin comprises nucleic acid, primarily DNA, and protein, including 
histones and nonhistone chromosomal proteins. The majority of eukaryotic cellular 
chromatin exists in the form of nucleosomes, wherein a nucleosome core comprises 
5 approximately 1 50 base pairs of DNA associated with an octamer comprising two 
each of histones H2A, H2B, H3 and H4; and linker DNA (of variable length 
depending on the organism) extends between nucleosome cores. A molecule of 
histone HI is generally associated with the linker DNA. For the purposes of the 
present disclosure, the term "chromatin" is meant to encompass all types of cellular 

1 0 nucleoprotein, both prokaryotic and eukaryotic. Cellular chromatin includes both 
chromosomal and episomal chromatin. 

A "chromosome," is a chromatin complex comprising all or a portion of the 
genome of a cell. The genome of a cell is often characterized by its karyotype, which 
is the collection of all the chromosomes that comprise the genome of the cell. The 

1 5 genome of a cell can comprise one or more chromosomes. 

An "episome" is a replicating nucleic acid, nucleoprotein complex or other 
structure comprising a nucleic acid that is not part of the chromosomal karyotype of a 
cell. Examples of episomes include plasmids and certain viral genomes. 

An "accessible region" is a site in cellular chromatin in which a target site 

20 present in the nucleic acid can be bound by an exogenous molecule which recognizes 
the target site. Without wishing to be bound by any particular theory, it is believed 
that an accessible region is one that is not packaged into a nucleosomal structure. The 
distinct structure of an accessible region can often be detected by its sensitivity to 
chemical and enzymatic probes, for example, nucleases. 

25 A "target site" or "target sequence" is a nucleic acid sequence that defines a 

portion of a nucleic acid to which a binding molecule will bind, provided sufficient 
conditions for binding exist. For example, the sequence 5'-GAATTC-3 > is a target 
site for the Eco RI restriction endonuclease. 

An "exogenous" molecule is a molecule that is not normally present in a cell, 

30 but can be introduced into a cell by one or more genetic, biochemical or other 

methods. "Normal presence in the cell" is determined with respect to the particular 
developmental stage and environmental conditions of the cell. Thus, for example, a 
molecule that is present only during embryonic development of muscle is an 
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exogenous molecule with respect to an adult muscle cell. Similarly, a molecule 
induced by heat shock is an exogenous molecule with respect to a non-heat-shocked 
cell. An exogenous molecule can comprise, for example, a functioning version of a 
malfunctioning endogenous molecule or a malfunctioning version of a normally- 
5 functioning endogenous molecule. 

An exogenous molecule can be, among other things, a small molecule, such as 
is generated by a combinatorial chemistry process, or a macromolecule such as a 
protein, nucleic acid, carbohydrate, lipid, glycoprotein, lipoprotein, polysaccharide, 
any modified derivative of the above molecules, or any complex comprising one or 

10 more of the above molecules. Nucleic acids include DNA and RNA, can be single- or 
double-stranded; can be linear, branched or circular; and can be of any length. 
Nucleic acids include those capable of forming duplexes, as well as triplex-forming 
nucleic acids. See, for example, U.S. Patent Nos. 5,1 76,996 and 5,422,25 1 . Proteins 
include, but are not limited to, DNA-binding proteins, transcription factors, chromatin 

1 5 remodeling factors, methylated DNA binding proteins, polymerases, methylases, 
demethylases, acetylases, deacetylases, kinases, phosphatases, integrases, 
recombinases, ligases, topoisomerases, gyrases and helicases. 

An exogenous molecule can be the same type of molecule as an endogenous 
molecule, e.g., an exogenous protein or nucleic acid. For example, an exogenous 

20 nucleic acid can comprise an infecting viral genome, a plasmid or episome introduced 
into a cell, or a chromosome that is not normally present in the cell. Methods for the 
introduction of exogenous molecules into cells are known to those of skill in the art 
and include, but are not limited to, lipid-mediated transfer (Le. 9 liposomes, including 
neutral and cationic lipids), electroporation, direct injection, cell fusion, particle 

25 bombardment, calcium phosphate co-precipitation, DEAE-dextran-mediated transfer 
and viral vector-mediated transfer. 

By contrast, an "endogenous" molecule is one that is normally present in a 
particular cell at a particular developmental stage under particular environmental 
conditions. For example, an endogenous nucleic acid can comprise a chromosome, 

30 the genome of a mitochondrion, chloroplast or other organelle, or a naturally- 
occurring episomal nucleic acid. Additional endogenous molecules can include 
proteins, for example, transcription factors and enzymes. 
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A "fusion" molecule is a molecule in which two or more subunit molecules are 
linked, preferably covalently. The subunit molecules can be the same chemical type 
of molecule, or can be different chemical types of molecules. Examples of the first 
type of fusion molecule include, but are not limited to, fusion proteins (for example, a 
5 fusion between a ZFP DNA-binding domain and a cleavage domain) and fusion 
nucleic acids (for example, a nucleic acid encoding the fusion protein described 
supra). Examples of the second type of fusion molecule include, but are not limited 
to, a fusion between a triplex-forming nucleic acid and a polypeptide, and a fusion 
between a minor groove binder and a nucleic acid. 

10 Expression of a fusion protein in a cell can result from delivery of the fusion 

protein to the cell or by delivery of a polynucleotide encoding the fusion protein to a 
cell, wherein the polynucleotide is transcribed, and the transcript is translated, to 
generate the fusion protein. Trans-splicing, polypeptide cleavage and polypeptide 
ligation can also be involved in expression of a protein in a cell. Methods for 

1 5 polynucleotide and polypeptide delivery to cells are presented elsewhere in this 
disclosure. 

A "gene," for the purposes of the present disclosure, includes a DNA region 
encoding a gene product (see infra), as well as all DNA regions which regulate the 
production of the gene product, whether or not such regulatory sequences are adjacent 

20 to coding and/or transcribed sequences. Accordingly, a gene includes, but is not 
necessarily limited to, promoter sequences, terminators, translational regulatory 
sequences such as ribosome binding sites and internal ribosome entry sites, enhancers, 
silencers, insulators, boundary elements, replication origins, matrix attachment sites 
and locus control regions. 

25 "Gene expression" refers to the conversion of the information, contained in a 

gene, into a gene product. A gene product can be the direct transcriptional product of 
a gene (e.g., mRNA, tRNA, rRNA, antisense RNA, ribozyme, structural RNA or any 
other type of RNA) or a protein produced by translation of a mRNA. Gene products 
also include RNAs which are modified, by processes such as capping, 

30 polyadenylation, methylation, and editing, and proteins modified by, for example, 
methylation, acetylation, phosphorylation, ubiquitination, ADP-ribosylation, 
myristilation, and glycosylation. 
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"Modulation" of gene expression refers to a change in the activity of a gene. 
Modulation of expression can include, but is not limited to, gene activation and gene 
repression. 

"Eucaryotic" cells include, but are not limited to, ftjngal cells (such as yeast), 
5 plant cells, animal cells, mammalian cells and human cells. 

A "region of interest" is any region of cellular chromatin, such as, for 
example, a gene or a non-coding sequence within or adjacent to a gene, in which it is 
desirable to bind an exogenous molecule. Binding can be for the purposes of targeted 
DNA cleavage and/or targeted recombination. A region of interest can be present in a 
10 chromosome, an episome, an organellar genome (e.g., mitochondrial, chloroplast), or 
an infecting viral genome, for example. A region of interest can be within the coding 
region of a gene, within transcribed non-coding regions such as, for example, leader 
sequences, trailer sequences or introns, or within non-transcribed regions, either 
upstream or downstream of the coding region. A region of interest can be as small as 
1 5 a single nucleotide pair or up to 2,000 nucleotide pairs in length, or any integral value 
of nucleotide pairs. 

The terms "operative linkage" and "operatively linked" (or "operably linked") 
are used interchangeably with reference to a juxtaposition of two or more components 
(such as sequence elements), in which the components are arranged such that both 

20 components function normally and allow the possibility that at least one of the 
components can mediate a function that is exerted upon at least one of the other 
components. By way of illustration, a transcriptional regulatory sequence, such as a 
promoter, is operatively linked to a coding sequence if the transcriptional regulatory 
sequence controls the level of transcription of the coding sequence in response to the 

25 presence or absence of one or more transcriptional regulatory factors. A 

transcriptional regulatory sequence is generally operatively linked in cis with a coding 
sequence, but need not be directly adjacent to it. For example, an enhancer is a 
transcriptional regulatory sequence that is operatively linked to a coding sequence, 
even though they are not contiguous. 

30 With respect to fusion polypeptides, the term "operatively linked" can refer to 

the fact that each of the components performs the same function in linkage to the 
other component as it would if it were not so linked. For example, with respect to a 
fusion polypeptide in which a ZFP DNA-binding domain is fused to a cleavage 
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domain, the ZFP DNA-binding domain and the cleavage domain are in operative 
linkage if, in the fusion polypeptide, the ZFP DNA-binding domain portion is able to 
bind its target site and/or its binding site, while the cleavage domain is able to cleave 
DNA in the vicinity of the target site. 
5 A "functional fragment" of a protein, polypeptide or nucleic acid is a protein, 

polypeptide or nucleic acid whose sequence is not identical to the full-length protein, 
polypeptide or nucleic acid, yet retains the same function as the full-length protein, 
polypeptide or nucleic acid. A functional fragment can possess more, fewer, or the 
same number of residues as the corresponding native molecule, and/or can contain 

1 0 one ore more amino acid or nucleotide substitutions. Methods for determining the 

function of a nucleic acid (e.g., coding function, ability to hybridize to another nucleic 
acid) are well-known in the art. Similarly, methods for determining protein function 
are well-known. For example, the DNA-binding function of a polypeptide can be 
determined, for example, by filter-binding, electrophoretic mobility-shift, or 

1 5 immunoprecipitation assays. DNA cleavage can be assayed by gel electrophoresis. 
See Ausubel et al., supra. The ability of a protein to interact with another protein can 
be determined, for example, by co-immunoprecipitation, two-hybrid assays or 
complementation, both genetic and biochemical. See, for example, Fields et al. 
(1989) Nature 340:245-246; U.S. Patent No. 5,585,245 and PCT WO 98/44350. 

20 

Target sites 

The disclosed methods and compositions include fusion proteins comprising, a 
cleavage domain (or a cleavage half-domain) and a zinc finger domain, in which the 
zinc finger domain, by binding to a sequence in cellular chromatin (e.g., a target site 

25 or a binding site), directs the activity of the cleavage domain (or cleavage half- 
domain) to the vicinity of the sequence and, hence, induces cleavage in the vicinity of 
the target sequence. As set forth elsewhere in this disclosure, a zinc finger domain 
can be engineered to bind to virtually any desired sequence. Accordingly, after 
identifying a region of interest containing a sequence at which cleavage or 

30 recombination is desired, one or more zinc finger binding domains can be engineered 
to bind to one or more sequences in the region of interest. Expression of a fusion 
protein comprising a zinc finger binding domain and a cleavage domain (or of two 
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fusion proteins, each comprising a zinc finger binding domain and a cleavage half- 
domain), in a cell, effects cleavage in the region of interest 

Selection of a sequence in cellular chromatin for binding by a zinc finger 
domain (e.g., a target site) can be accomplished, for example, according to the 
5 methods disclosed in co-owned US Patent No. 6,453,242 (Sept. 17, 2002), which also 
discloses methods for designing ZFPs to bind to a selected sequence. It will be clear 
to those skilled in the art that simple visual inspection of a nucleotide sequence can 
also be used for selection of a target site. Accordingly, any means for target site 
selection can be used in the claimed methods. 

10 Target sites are generally composed of a plurality of adjacent target subsites. 

A target subsite refers to the sequence (usually either a nucleotide triplet, or a 
nucleotide quadruplet that can overlap by one nucleotide with an adjacent quadruplet) 
bound by an individual zinc finger. See, for example, WO 02/077227. If the strand 
with which a zinc finger protein makes most contacts is designated the target strand 

1 5 "primary recognition strand," or "primary contact strand," some zinc finger proteins 
bind to a three base triplet in the target strand and a fourth base on the non-target 
strand. A target site generally has a length of at least 9 nucleotides and, accordingly, 
is bound by a zinc finger binding domain comprising at least three zinc fingers. 
However binding of, for example, a 4-finger binding domain to a 12-nucIeotide target 

20 site, a 5-finger binding domain to a 1 5-nucleotide target site or a 6-finger binding 
domain to an 1 8-nucIeotide target site, is also possible. As will be apparent, binding 
of larger binding domains (e.g., 7-, 8-, 9-finger and more) to longer target sites is also 
possible. 

It is not necessary for a target site to be a multiple of three nucleotides. For 
25 example, in cases in which cross-strand interactions occur (see, e.g., US Patent 

6,453,242 and WO 02/077227), one or more of the individual zinc fingers of a multi- 
finger binding domain can bind to overlapping quadruplet subsites. As a result, a 
three-finger protein can bind a 10-nucleotide sequence, wherein the tenth nucleotide is 
part of a quadruplet bound by a terminal finger, a four-finger protein can bind a 13- 
30 nucleotide sequence, wherein the thirteenth nucleotide is part of a quadruplet bound 
by a terminal finger, etc. 

The length and nature of amino acid linker sequences between individual zinc 
fingers in a multi-finger binding domain also affects binding to a target sequence. For 
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example, the presence of a so-called "non-canonical linker/* "long linker" or 
"structured linker" between adjacent zinc fingers in a multi-finger binding domain can 
allow those fingers to bind subsites which are not immediately adjacent. Non-limiting 
examples of such linkers are described, for example, in US Patent No. 6,479,626 and 
5 WO 01/53480. Accordingly, one or more subsites, in a target site for a zinc finger 
binding domain, can be separated from each other by 1, 2, 3, 4, 5 or more nucleotides. 
To provide but one example, a four-finger binding domain can bind to a 13-nucIeotide 
target site comprising, in sequence, two contiguous 3-nucleotide subsites, an 
intervening nucleotide, and two contiguous triplet subsites. 

10 Distance between sequences (e.g., target sites) refers to the number of 

nucleotides or nucleotide pairs intervening between two sequences, as measured from 
the edges of the sequences nearest each other. 

In certain embodiments in which cleavage depends on the binding of two zinc 
finger domain/cleavage half-domain fusion molecules to separate target sites, the two 

1 5 target sites can be on opposite DNA strands. In other embodiments, both target sites 
are on the same DNA strand. 

Zinc finger binding domains 

A zinc finger binding domain comprises one or more zinc fingers. Miller et 
20 al. (\9i5)EMBOJ. 4:1609-1614; Rhodes (1993) Scientific American Feb.:56-65; 
US Patent No. 6,453,242. Typically, a single zinc finger domain is about 30 amino 
acids in length. Structural studies have demonstrated that each zinc finger domain 
(motif) contains two beta sheets (held in a beta turn which contains the two invariant 
cysteine residues) and an alpha helix (containing the two invariant histidine residues), 
25 which are held in a particular conformation through coordination of a zinc atom by 
the two cysteines and the two histidines, 

Zinc fingers include both canonical C 2 H 2 zinc fingers (i.e., those in which the 
zinc ion is coordinated by two cysteine and two histidine residues) and non-canonical 
zinc fingers such as, for example, C 3 H zinc fingers (those in which the zinc ion is 
30 coordinated by three cysteine residues and one histidine residue) and C 4 zinc fingers 
(those in which the zinc ion is coordinated by four cysteine residues). See also 
WO 02/057293. 
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Zinc finger binding domains can be engineered to bind to a sequence of 
choice. See, for example, Beerli et al (2002) Nature Biotechnoi 20:135-141; Pabo et 
al (2001)ylwj. Rev. Biochem. 70:313-340; Isalan et al (2001) Nature Biotechnoi 
19:656-660; Segal et al (2001) Curr. Opin. Biotechnoi 12:632-637; Choo et al 
5 (2000) Curr. Opin. Struct. Biol 10:41 1-416. An engineered zinc finger binding 
domain can have a novel binding specificity, compared to a naturally-occurring zinc 
finger protein. Engineering methods include, but are not limited to, rational design 
and various types of selection. Rational design includes, for example, using databases 
comprising triplet (or quadruplet) nucleotide sequences and individual zinc finger 

1 0 amino acid sequences, in which each triplet or quadruplet nucleotide sequence is 
associated with one or more amino acid sequences of zinc fingers which bind the 
particular triplet or quadruplet sequence. See, for example, co-owned U.S. Patents 
6,453,242 and 6,534,261. 

Exemplary selection methods, including phage display and two-hybrid 

15 systems, are disclosed in US Patents 5,789,538; 5,925,523; 6,007,988; 6,013,453; 
6,410,248; 6,140,466; 6,200,759; and 6,242,568; as well as WO 98/371 86; 
WO 98/53057; WO 00/27878; WO 01/88197 and GB 2,338,237. 

Enhancement of binding specificity for zinc finger binding domains has been 
described, for example, in co-owned WO 02/077227. 

20 Since an individual zinc finger binds to a three-nucleotide (Le. 9 triplet) 

sequence (or a four-nucleotide sequence which can overlap, by one nucleotide, with 
the four-nucleotide binding site of an adjacent zinc finger), the length of a sequence to 
which a zinc finger binding domain is engineered to bind (e.g., a target sequence) will 
determine the number of zinc fingers in an engineered zinc finger binding domain. 

25 For example, for ZFPs in which the finger motifs do not bind to overlapping subsites, 
a six-nucleotide target sequence is bound by a two-finger binding domain; a nine- 
nucleotide target sequence is bound by a three-finger binding domain, etc. As noted 
herein, binding sites for individual zinc fingers (i.e., subsites) in a target site need not 
be contiguous, but can be separated by one or several nucleotides, depending on the 

30 length and nature of the amino acids sequences between the zinc fingers (i.e., the 
inter-finger linkers) in a multi-finger binding domain. 

In a multi-finger zinc finger binding domain, adjacent zinc fingers can be 
separated by amino acid linker sequences of approximately 5 amino acids (so-called 
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"canonical" inter-finger linkers) or, alternatively, by one or more non-canonical 
linkers. See, e.g., co-owned US Patent Nos. 6,453,242 and 6,534,261 . For 
engineered zinc finger binding domains comprising more than three fingers, insertion 
of longer ("non-canonical") inter-finger linkers between certain of the zinc fingers 
5 may be preferred as it may increase the affinity and/or specificity of binding by the 
binding domain. See, for example, U.S. Patent No. 6,479,626 and WO 01/53480. 
Accordingly, multi-finger zinc finger binding domains can also be characterized with 
respect to the presence and location of non-canonical inter-finger linkers. For 
example, a six-finger zinc finger binding domain comprising three fingers (joined by 

10 two canonical inter-finger linkers), a long linker and three additional fingers (joined 
by two canonical inter-finger linkers) is denoted a 2x3 configuration. Similarly, a 
binding domain comprising two fingers (with a canonical linker therebetween), a long 
linker and two additional fingers (joined by a canonical linker) is denoted a 2x2 
protein. A protein comprising three two-finger units (in each of which the two fingers 

15 are joined by a canonical linker), and in which each two-finger unit is joined to the 
adjacent two finger unit by a long linker, is referred to as a 3x2 protein. 

The presence of a long or non-canonical inter-finger linker between two 
adjacent zinc fingers in a multi-finger binding domain often allows the two fingers to 
bind to subsites which are not immediately contiguous in the target sequence. 

20 Accordingly, there can be gaps of one or more nucleotides between subsites in a 

target site; i.e., a target site can contain one or more nucleotides that are not contacted 
by a zinc finger. For example, a 2x2 zinc finger binding domain can bind to two six- 
nucleotide sequences separated by one nucleotide, i.e., it binds to a 13-nucleotide 
target site. See also Moore et ai (2001a) Proc. Natl. Acad. Sci. USA 98:1432-1436; 

25 Moore et al (2001b) Proc. Natl. Acad. Sci. USA 98:1437-1441 and WO 01/53480. 
As mentioned previously, a target subsite is a three- or four-nucleotide 
sequence that is bound by a single zinc finger. For certain purposes, a two-finger unit 
is denoted a binding module. A binding module can be obtained by, for example, 
selecting for two adjacent fingers in the context of a multi-finger protein (generally 

30 three fingers) which bind a particular six-nucleotide target sequence. Alternatively, 
modules can be constructed by assembly of individual zinc fingers. See also 
WO 98/53057 and WO 01/53480. 
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Cleavage domains 

The cleavage domain portion of the fusion proteins disclosed herein can be 
obtained from any endo- or exonuclease. Exemplary endonucleases from which a 
cleavage domain can be derived include, but are not limited to, restriction 
5 endonucleases and homing endonucleases. See, for example, 2002-2003 Catalogue, 
New England Biolabs, Beverly, MA; and Belfort et ai (1997) Nucleic Acids Res. 
25:3379-3388. Additional enzymes which cleave DNA are known (e.g., SI Nuclease; 
mung bean nuclease; pancreatic DNase I; micrococcal nuclease; yeast HO 
endonuclease; see also Linn et al (eds.) Nucleases, Cold Spring Harbor Laboratory 

1 0 Press, 1 993). One or more of these enzymes (or functional fragments thereof) can be 
used as a source of cleavage domains and cleavage half-domains. 

Similarly, a cleavage half-domain (e.g., fusion proteins comprising a zinc 
finger binding domain and a cleavage half-domain) can be derived from any nuclease 
or portion thereof, as set forth above, that requires dimerization for cleavage activity. 

15 In general, two fusion proteins are required for cleavage if the fusion proteins 

comprise cleavage half-domains. The two cleavage half-domains can be derived from 
the same endonuclease (or functional fragments thereof), or each cleavage half- 
domain can be derived from a different endonuclease (or functional fragments 
thereof)- In addition, the target sites for the two fusion proteins are preferably 

20 disposed, with respect to each other, such that binding of the two fusion proteins 

places the cleavage half-domains in a spatial orientation to each other that allows the 
cleavage half-domains to form a functional cleavage domain, e.g., by dimerizing. 
Thus, in certain embodiments, the near edges of the target sites are separated by 5-8 
nucleotides or by 1 5-18 nucleotides. However any integral number of nucleotides or 

25 nucleotide pairs can intervene between two target sites (e.g., from 2 to 50 nucleotides 
or more). In general, the point of cleavage lies between the target sites. 

In general, if two fusion proteins are used, each comprising a cleavage half- 
domain, the primary contact strand for the zinc finger portion of each fusion protein 
will be on a different DNA strands and in opposite orientation. That is, for a pair of 

30 ZFP/cleavage half-domain fusions, the target sequences are on opposite strands and 
the two proteins bind in opposite orientations. 

Restriction endonucleases (restriction enzymes) are present in many species 
and are capable of sequence-specific binding to DNA (at a recognition site), and 
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cleaving DNA at or near the site of binding. Certain restriction enzymes (e.g., Type 
IIS) cleave DNA at sites removed from the recognition site and have separable 
binding and cleavage domains. For example, the Type IIS enzyme Fok I catalyzes 
double-stranded cleavage of DNA, at 9 nucleotides from its recognition site on one 
5 strand and 13 nucleotides from its recognition site on the other. See, for example, US 
Patents 5,356,802; 5,436,150 and 5,487,994; as well as Li etal (1992) Proc. Natl 
Acad. Sci. USA 89:4275-4279; Li et al. (1993) Proc, Natl Acad. Sci. USA 90:2764- 
2768; Kim etal (\994a) Proc. Natl. Acad. Sci. USA 91:883-887; Kim et al. (1994b) 
J. Biol Chem. 269:31,978-31,982. Thus, in one embodiment, fusion proteins 

10 comprise the cleavage domain (or cleavage half-domain) from at least one Type IIS 
restriction enzyme and one or more zinc finger binding domains, which may or may 
not be engineered. 

An exemplary Type IIS restriction enzyme, whose cleavage domain is 
separable from the binding domain, is Fok I. This particular enzyme is active as a 

15 dimer. Bitinaite et al. (1998) Proc. Natl. Acad. Sci. USA 95: 10,570-10,575. 
Accordingly, for the purposes of the present disclosure, the portion of the Fok I 
enzyme used in the disclosed fusion proteins is considered a cleavage half-domain. 
Thus, for targeted double-stranded cleavage and/or targeted replacement of cellular 
sequences using zinc flnger-Fo* I fusions, two fusion proteins, each comprising a 

20 Fokl cleavage half-domain, can be used to reconstitute a catalytically active cleavage 
domain. Alternatively, a single polypeptide molecule containing a zinc finger binding 
domain and two Fok I cleavage half-domains can also be used. Parameters for 
targeted cleavage and targeted sequence alteration using zinc finger-Fo* I fusions are 
provided elsewhere in this disclosure. 

25 Exemplary Type IIS restriction enzymes are listed in Table 1 . Additional 

restriction enzymes also contain separable binding and cleavage domains, and these 
are contemplated by the present disclosure. See, for example, Roberts et al. (2003) 
Nucleic Acids Res. 31:41 8-420. 

30 Table 1: Some Type IIS Restriction Enzymes 



Aarl 



BsrBI 



SspD5 I 
Sthl32I 



Ace III 



BsrDl 



Acil 



BstF5 1 



Stsl 



40 



WO 2005/014791 



PCT/US2004/025407 





AIoI 


Btrl 


TspDTI 




Bael 


BtsI 


TspGW I 




Bbr7l 


Cdil 


Tthlll 11 




Bbvl 


CjePI 


UbaPI 


5 


Bbv II 


DrdH 


Bsal 




BbvCI 


Ecil 


BsmB I 




Bcc I 


Eco31 I 






Bce83 I 


Eco57I 






BceA 1 


Eco57M I 




10 


Beef I 


Esp3I 






Bcgl 


Paul 






BciVI 


Fin I 






Bfil 


Fokl 






Bin I 


Gdi II 




15 


Bmgl 


Gsu 1 






BpulOI 


Hgal 






BsaXI 


Hin4 II 






Bsbl 


HphI 






BscA I 


Ksp632 I 




20 


BscGI 


Mboll 






BseRI 


Mlyl 






BseYI 


Mme 1 






Bsil 


Mnll 






BsmI 


Pfll 108 I 




25 


BsmA 1 


Pie I 






RcmF f 


p n ; i 
rpl 1 






Bsp24 I 


Psrl 






BspGI 


RleAI 






BspMI 


Sap I 




30 


BspNC I 


SfaNI 






BsrI 


Sim I 





41 



WO 2005/014791 



PCTVUS2004/025407 



Zinc finger domain-cleavage domain fusions 

Methods for design and construction of fusion proteins (and polynucleotides 
encoding same) are known to those of skill in the art. For example, methods for the 
design and construction of fusion protein comprising zinc finger proteins (and 
5 polynucleotides encoding same) are described in co-owned US Patents 6,453,242 and 
6,534,261. In certain embodiments, polynucleotides encoding such fusion proteins 
are constructed. These polynucleotides can be inserted into a vector and the vector 
can be introduced into a cell (see below for additional disclosure regarding vectors 
and methods for introducing polynucleotides into cells). 

10 In certain embodiments of the methods described herein, a fusion protein 

comprises a zinc finger binding domain and a cleavage half-domain from the Fok I 
restriction enzyme, and two such fusion proteins are expressed in a cell. Expression 
of two fusion proteins in a cell can result from delivery of the two proteins to the cell; 
delivery of one protein and one nucleic acid encoding one of the proteins to the cell; 

1 5 delivery of two nucleic acids, each encoding one of the proteins, to the cell; or by 
delivery of a single nucleic acid, encoding both proteins, to the celL In additional 
embodiments, a fusion protein comprises a single polypeptide chain comprising two 
cleavage half domains and a zinc finger binding domain. In this case, a single fusion 
protein is expressed in a cell and, without wishing to be bound by theory, is believed 

20 to cleave DNA as a result of formation of an intramolecular dimer of the cleavage 
half-domains. 

In general, the components of the fusion proteins (e.g, ZFP-Fok I fusions) are 
arranged such that the zinc finger domain is nearest the amino terminus of the fusion 
protein, and the cleavage half-domain is nearest the carboxy-terminus. This mirrors 

25 the relative orientation of the cleavage domain in naturally-occurring dimerizing 
cleavage domains such as those derived from the Fok I enzyme, in which the DNA- 
binding domain is nearest the amino terminus and the cleavage half-domain is nearest 
the carboxy terminus. 

In the disclosed fusion proteins, the amino acid sequence between the zinc 

30 finger binding domain (which is delimited by the N-terminal most of the two 

conserved cysteine residues and the C-terminal-most of the two conserved histidine 
residues) and the cleavage domain (or half-domain) is denoted the "ZC linker." The 
ZC linker is to be distinguished from the inter-finger linkers discussed above. For 
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instance, in a ZFP-Fok I fusion protein (in which the components are arranged: N 
terminus-zinc finger binding domain-FoA: I cleavage half domain-C terminus), the ZC 
linker is located between the second histidine residue of the C-terminal-most zinc 
finger and the N-terminal-most amino acid residue of the cleavage half-domain 
5 (which is generally glutamine (Q) in the sequence QLV). The ZC linker can be any 
amino acid sequence. To obtain optimal cleavage, the length of the linker and the 
distance between the target sites (binding sites) are interrelated. See, for example, 
Smith et ah (2000) Nucleic Acids Res. 28:3361-3369; Bibikova et al. (2001) Mol 
Cell Biol 21:289-297, noting that their notation for linker length differs from that 
10 given here. For example, for ZFP-Fok I fusions having a ZC linker length of four 
amino acids (as defined herein), optimal cleavage occurs when the binding sites for 
the fusion proteins are located 6 or 16 nucleotides apart (as measured from the near 
edge of each binding site). 

1 5 Methods for targeted cleavage 

The disclosed methods and compositions can be used to cleave DNA at a 
region of interest in cellular chromatin (e.g., at a desired or predetermined site in a 
genome, for example, in a gene, either mutant or wild-type). For such targeted DNA 
cleavage, a zinc finger binding domain is engineered to bind a target site at or near the 

20 predetermined cleavage site, and a fusion protein comprising the engineered zinc 

finger binding domain and a cleavage domain is expressed in a cell. Upon binding of 
the zinc finger portion of the fusion protein to the target site, the DNA is cleaved near 
the target site by the cleavage domain. The exact site of cleavage can depend on the 
length of the ZC linker. 

25 Alternatively, two fusion proteins, each comprising a zinc finger binding 

domain and a cleavage half-domain, are expressed in a cell, and bind to target sites 
which are juxtaposed in such a way that a functional cleavage domain is reconstituted 
and DNA is cleaved in the vicinity of the target sites. In one embodiment, cleavage 
occurs between the target sites of the two zinc finger binding domains. One or both 

30 of the zinc finger binding domains can be engineered. 

For targeted cleavage using a zinc finger binding domain-cleavage domain 
fusion polypeptide, the binding site can encompass the cleavage site, or the near edge 
of the binding site can be I, 2, 3, 4, 5, 6, 10, 25, 50 or more nucleotides (or any 
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integral value between 1 and 50 nucleotides) from the cleavage site. The exact 
location of the binding site, with respect to the cleavage site, will depend upon the 
particular cleavage domain, and the length of the ZC linker. For methods in which 
two fusion polypeptides, each comprising a zinc finger binding domain and a 
5 cleavage half-domain, are used, the binding sites generally straddle the cleavage site. 
Thus the near edge of the first binding site can be 1 , 2, 3, 4, 5, 6] 1 0, 25 or more 
nucleotides (or any integral value between 1 and 50 nucleotides) on one side of the 
cleavage site, and the near edge of the second binding site can be 1, 2, 3, 4, 5, 6, 10, 
25 or more nucleotides (or any integral value between 1 and 50 nucleotides) on the 

1 0 other side of the cleavage site. Methods for mapping cleavage sites in vitro and in 
vivo are known to those of skill in the art. 

Thus, the methods described herein can employ an engineered zinc finger 
binding domain fused to a cleavage domain. In these cases, the binding domain is 
engineered to bind to a target sequence, at or near which cleavage is desired. The 

1 5 fusion protein, or a polynucleotide encoding same, is introduced into a cell. Once 
introduced into, or expressed in, the cell, the fusion protein binds to the target 
sequence and cleaves at or near the target sequence. The exact site of cleavage 
depends on the nature of the cleavage domain and/or the presence and/or nature of 
linker sequences between the binding and cleavage domains. In cases where two 

20 fusion proteins, each comprising a cleavage half-domain, are used, the distance 
between the near edges of the binding sites can be I, 2, 3, 4, 5, 6, 7, 8, 9, 10, 25 or 
more nucleotides (or any integral value between 1 and 50 nucleotides). Optimal 
levels of cleavage can also depend on both the distance between the binding sites of 
the two fusion proteins (See, for example, Smith et al (2000) Nucleic Acids Res. 

25 28:3361-3369; Bibikova et al (2001) Mol Cell Biol 21:289-297) and the length of 
the ZC linker in each fusion protein. 

For ZFP-Fok\ fusion nucleases, the length of the linker between the ZFP and 
the Fokl cleavage half-domain (re., the ZC linker) can influence cleavage efficiency. 
In one experimental system utilizing a ZFP-Foki fusion with a ZC linker of 4 amino 

30 acid residues, optimal cleavage was obtained when the near edges of the binding sites 
for two ZFP-Fok\ nucleases were separated by 6 base pairs. This particular fusion 
nuclease comprised the following amino acid sequence between the zinc finger 
portion and the nuclease half-domain: 
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HQRTHQNKKOLV (SEQ ID NO:26) 
in which the two conserved histidines in the C-terminal portion of the zinc finger and 
the first three residues in the Fokl cleavage half-domain are underlined. Accordingly, 
the linker sequence in this construct is QNKK. Bibikova et ah (2001) Mol Cell Biol. 
5 21:289-297. The present inventors have constructed a number of ZFP-Fo/tl fusion 
nucleases having a variety of ZC linker lengths and sequences, and analyzed the 
cleavage efficiencies of these nucleases on a series of substrates having different 
distances between the ZFP binding sites. See Example 4. 

In certain embodiments, the cleavage domain comprises two cleavage half- 

10 domains, both of which are part of a single polypeptide comprising a binding domain, 
a first cleavage half-domain and a second cleavage half-domain. The cleavage half- 
domains can have the same amino acid sequence or different amino acid sequences, 
so long as they function to cleave the DNA. 

Cleavage half-domains may also be provided in separate molecules. For 

1 5 example, two fusion polypeptides may be introduced into a cell, wherein each 

polypeptide comprises a binding domain and a cleavage half-domain. The cleavage 
half-domains can have the same amino acid sequence or different amino acid 
sequences, so long as they function to cleave the DNA. Further, the binding domains 
bind to target sequences which are typically disposed in such a way that, upon binding 

20 of the fusion polypeptides, the two cleavage half-domains are presented in a spatial 
orientation to each other that allows reconstitution of a cleavage domain (e.g., by 
dimerization of the half-domains), thereby positioning the half-domains relative to 
each other to form a functional cleavage domain, resulting in cleavage of cellular 
chromatin in a region of interest. Generally, cleavage by the reconstituted cleavage 

25 domain occurs at a site located between the two target sequences. One or both of the 
proteins can be engineered to bind to its target site. 

The two fusion proteins can bind in the region of interest in the same or 
opposite polarity, and their binding sites (i.e., target sites) can be separated by any 
number of nucleotides, e.g., from 0 to 200 nucleotides or any integral value 

30 therebetween. In certain embodiments, the binding sites for two fusion proteins, each 
comprising a zinc finger binding domain and a cleavage half-domain, can be located 
between 5 and 1 8 nucleotides apart, for example, 5-8 nucleotides apart, or 1 5-1 8 
nucleotides apart, or 6 nucleotides apart, or 16 nucleotides apart, as measured from 
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the edge of each binding site nearest the other binding site, and cleavage occurs 
between the binding sites. 

The site at which the DNA is cleaved generally lies between the binding sites 
for the two fusion proteins. Double-strand breakage of DNA often results from two 
5 single-strand breaks, or "nicks," offset by 1, 2, 3, 4, 5, 6 or more nucleotides, (for 
example, cleavage of double-stranded DNA by native Fok I results from single-strand 
breaks offset by 4 nucleotides). Thus, cleavage does not necessarily occur at exactly 
opposite sites on each DNA strand. In addition, the structure of the fusion proteins 
and the distance between the target sites can influence whether cleavage occurs 

1 0 adjacent a single nucleotide pair, or whether cleavage occurs at several sites. 
However, for many applications, including targeted recombination (see infra) 
cleavage within a range of nucleotides is generally sufficient, and cleavage between 
particular base pairs is not required. 

As noted above, the fusion protein(s) can be introduced as polypeptides and/or 

1 5 polynucleotides. For example, two polynucleotides, each comprising sequences 
encoding one of the aforementioned polypeptides, can be introduced into a cell, and 
when the polypeptides are expressed and each binds to its target sequence, cleavage 
occurs at or near the target sequence. Alternatively, a single polynucleotide 
comprising sequences encoding both fusion polypeptides is introduced into a cell. 

20 Polynucleotides can be DNA, RNA or any modified forms or analogues or DNA 
and/or RNA. 

To enhance cleavage specificity, additional compositions may also be 
employed in the methods described herein. For example, single cleavage half- 
domains can exhibit limited double-stranded cleavage activity. In methods in which 

25 two fusion proteins, each containing a three-finger zinc finger domain and a cleavage 
half-domain, are introduced into the cell, either protein specifies an approximately 9- 
nucleotide target site. Although the aggregate target sequence of 1 8 nucleotides is 
likely to be unique in a mammalian genome, any given 9-nucleotide target site occurs, 
on average, approximately 23,000 times in the human genome. Thus, non-specific 

30 cleavage, due to the site-specific binding of a single half-domain, may occur. 
Accordingly, the methods described herein contemplate the use of a dominant- 
negative mutant of a cleavage half-domain such as Fok I (or a nucleic acid encoding 
same) that is expressed in a cell along with the two fusion proteins. The dominant- 
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negative mutant is capable of dimerizing but is unable to cleave, and also blocks the 
cleavage activity of a half-domain to which it is dimerized. By providing the 
dominant-negative mutant in molar excess to the fusion proteins, only regions in 
which both fusion proteins are bound will have a high enough local concentration of 
5 functional cleavage half-domains for dimerization and cleavage to occur. At sites 
where only one of the two fusion proteins are bound, its cleavage half-domain forms a 
dimer with the dominant negative mutant half-domain, and undesirable, non-specific 
cleavage does not occur. 

Three catalytic amino acid residues in the Fok 1 cleavage half-domain have 

1 0 been identified: Asp 450, Asp 467 and Lys 469. Bitinaite et al (1 998) Proc. Natl 
Acad. Set USA 95: 10,570-10,575. Thus, one or more mutations at one of these 
residues can be used to generate a dominant negative mutation. Further, many of the 
catalytic amino acid residues of other Type IIS endonucleases are known and/or can 
be determined, for example, by alignment with Fok I sequences and/or by generation 

1 5 and testing of mutants for catalytic activity. 

Dimerization domain mutations in the cleavage half-domain 

Methods for targeted cleavage which involve the use of fusions between a ZFP 
and a cleavage half-domain (such as, e.g., a ZFP/Fokl fusion) require the use of two 

20 such fusion molecules, each generally directed to a distinct target sequence. Target 
sequences for the two fusion proteins can be chosen so that targeted cleavage is 
directed to a unique site in a genome, as discussed above. A potential source of 
reduced cleavage specificity could result from homodimerization of one of the two 
ZFP/cleavage half-domain fusions. This might occur, for example, due to the 

25 presence, in a genome, of inverted repeats of the target sequences for one of the two 
ZFP/cleavage half-domain fusions, located so as to allow two copies of the same 
fusion protein to bind with an orientation and spacing that allows formation of a 
functional dimer. 

One approach for reducing the probability of this type of aberrant cleavage at 
30 sequences other than the intended target site involves generating variants of the 

cleavage half-domain that minimize or prevent homodimerization. Preferably, one or 
more amino acids in the region of the half-domain involved in its dimerization are 
altered. In the crystal structure of the Fok\ protein dimer, the structure of the cleavage 
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half-domains is reported to be similar to the arrangement of the cleavage half-domains 
during cleavage of DN A by Fok\ . Wah et al. ( 1 998) Proc. Natl. Acad. Sci. USA 
95:10564-10569. This structure indicates that amino acid residues at positions 483 
and 487 play a key role in the dimerization of the Fok\ cleavage half-domains. The 
5 structure also indicates that amino acid residues at positions 446, 447, 479, 483, 484, 
486, 487, 490, 491, 496, 498, 499, 500, 531, 534, 537, and 538 are all close enough to 
the dimerization interface to influence dimerization. Accordingly, amino acid 
sequence alterations at one or more of the aforementioned positions will likely alter 
the dimerization properties of the cleavage half-domain. Such changes can be 

1 0 introduced, for example, by constructing a library containing (or encoding) different 
amino acid residues at these positions and selecting variants with the desired 
properties, or by rationally designing individual mutants. In addition to preventing 
homodimerization, it is also possible that some of these mutations may increase the 
cleavage efficiency above that obtained with two wild-type cleavage half-domains. 

1 5 Accordingly, alteration of a Fokl cleavage half-domain at any amino acid 

residue which affects dimerization can be used to prevent one of a pair of ZJF?/Fok\ 
fusions from undergoing homodimerization which can lead to cleavage at undesired 
sequences. Thus, for targeted cleavage using a pair of ZFP/Fokl fusions, one or both 
of the fusion proteins can comprise one or more amino acid alterations that inhibit 

20 self-dimerization, but allow heterodimerization of the two fusion proteins to occur 
such that cleavage occurs at the desired target site. In certain embodiments, 
alterations are present in both fusion proteins, and the alterations have additive 
effects; i.e., homodimerization of either fusion, leading to aberrant cleavage, is 
minimized or abolished, while heterodimerization of the two fusion proteins is 

25 facilitated compared to that obtained with wild-type cleavage half-domains. See 
Example 5. 

Methods for targeted alteration of genomic sequences and targeted 
recombination 

30 Also described herein are methods of replacing a genomic sequence (e.g., a 

region of interest in cellular chromatin) with a homologous non-identical sequence 
(r.e., targeted recombination). Previous attempts to replace particular sequences have 
involved contacting a cell with a polynucleotide comprising sequences bearing 
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homology to a chromosomal region (i.e., a donor DNA), followed by selection of 
cells in which the donor DNA molecule had undergone homologous recombination 
into the genome. The success rate of these methods is low, due to poor efficiency of 
homologous recombination and a high frequency of non-specific insertion of the 
5 donor DNA into regions of the genome other than the target site. 

The present disclosure provides methods of targeted sequence alteration 
characterized by a greater efficiency of targeted recombination and a lower frequency 
of non-specific insertion events. The methods involve making and using engineered 
zinc finger binding domains fused to cleavage domains (or cleavage half-domains) to 

1 0 make one or more targeted double-stranded breaks in cellular DNA. Because double- 
stranded breaks in cellular DNA stimulate homologous recombination several 
thousand-fold in the vicinity of the cleavage site, such targeted cleavage allows for the 
alteration or replacement (via homologous recombination) of sequences at virtually 
any site in the genome. 

1 5 In addition to the fusion molecules described herein, targeted replacement of a 

selected genomic sequence also requires the introduction of the replacement (or 
donor) sequence. The donor sequence can be introduced into the cell prior to, 
concurrently with, or subsequent to, expression of the fusion protein(s). The donor 
polynucleotide contains sufficient homology to a genomic sequence to support 

20 homologous recombination between it and the genomic sequence to which it bears 
homology. Approximately 25, 50 100 or 200 nucleotides or more of sequence 
homology between a donor and a genomic sequence (or any integral value between 10 
and 200 nucleotides, or more) will support homologous recombination therebetween. 
Donor sequences can range in length from 10 to 5,000 nucleotides (or any integral 

25 value of nucleotides therebetween) or longer. It will be readily apparent that the 

donor sequence is typically not identical to the genomic sequence that it replaces. For 
example, the sequence of the donor polynucleotide can contain one or more single 
base changes, insertions, deletions, inversions or rearrangements with respect to the 
genomic sequence, so long as sufficient homology is present to support homologous 

30 recombination. Alternatively, a donor sequence can contain a non-homologous 
sequence flanked by two regions of homology. Additionally, donor sequences can 
comprise a vector molecule containing sequences that are not homologous to the 
region of interest in cellular chromatin. Generally, the homologous region(s) of a 
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donor sequence will have at least 50% sequence identity to a genomic sequence with 
which recombination is desired. In certain embodiments, 60%, 70%, 80%, 90%, 
95%, 98%, 99%, or 99.9% sequence identity is present. Any value between 1% and 
100% sequence identity can be present, depending upon the length of the donor 
5 polynucleotide. 

A donor molecule can contain several, discontinuous regions of homology to 
cellular chromatin. For example, for targeted insertion of sequences not normally 
present in a region of interest, said sequences can be present in a donor nucleic acid 
molecule and flanked by regions of homology to sequence in the region of interest. 

1 0 To simplify assays (e.g., hybridization, PCR, restriction enzyme digestion) for 

determining successful insertion of the donor sequence, certain sequence differences 
may be present in the donor sequence as compared to the genomic sequence. 
Preferably, if located in a coding region, such nucleotide sequence differences will not 
change the amino acid sequence, or will make silent amino acid changes (i.e., changes 

1 5 which do not affect the structure or function of the protein). The donor 

polynucleotide can optionally contain changes in sequences corresponding to the zinc 
finger domain binding sites in the region of interest, to prevent cleavage of donor 
sequences that have been introduced into cellular chromatin by homologous 
recombination. 

20 The donor polynucleotide can be DNA or RNA, single-stranded or double- 

stranded and can be introduced into a cell in linear or circular form. If introduced in 
linear form, the ends of the donor sequence can be protected (e.g., from 
exonucleolytic degradation) by methods known to those of skill in the art. For 
example, one or more dideoxynucleotide residues are added to the 3* terminus of a 

25 linear molecule and/or self-complementary oligonucleotides are ligated to one or both 
ends. See, for example, Chang et ai (1987) Proc. Natl Acad ScL USA 84:4959- 
4963; Nehls et al (1996) Science 272:886-889. Additional methods for protecting 
exogenous polynucleotides from degradation include, but are not limited to, addition 
of terminal amino group(s) and the use of modified internucleotide linkages such as, 

30 for example, phosphorothioates, phosphoramidates, and Omethyl ribose or 

deoxyribose residues. A polynucleotide can be introduced into a cell as part of a 
vector molecule having additional sequences such as, for example, replication origins, 
promoters and genes encoding antibiotic resistance. Moreover, donor polynucleotides 
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can be introduced as naked nucleic acid, as nucleic acid complexed with an agent 
such as a liposome or poloxamer, or can be delivered by viruses (e.g., adenovirus, 
AAV). 

Without being bound by one theory, it appears that the presence of a double- 
5 stranded break in a cellular sequence, coupled with the presence of an exogenous 
DNA molecule having homology to a region adjacent to or surrounding the break, 
activates cellular mechanisms which repair the break by transfer of sequence 
information from the donor molecule into the cellular {e.g., genomic or chromosomal) 
sequence; i.e., by a processes of homologous recombination. Applicants' methods 

1 0 advantageously combine the powerful targeting capabilities of engineered ZFPs with 
a cleavage domain (or cleavage half-domain) to specifically target a double-stranded 
break to the region of the genome at which recombination is desired. 

For alteration of a chromosomal sequence, it is not necessary for the entire 
sequence of the donor to be copied into the chromosome, as long as enough of the 

1 5 donor sequence is copied to effect the desired sequence alteration. 

The efficiency of insertion of donor sequences by homologous recombination 
is inversely related to the distance, in the cellular DNA, between the double-stranded 
break and the site at which recombination is desired. In other words, higher 
homologous recombination efficiencies are observed when the double-stranded break 

20 is closer to the site at which recombination is desired. In cases in which a precise site 
of recombination is not predetermined (e.g., the desired recombination event can 
occur over an interval of genomic sequence), the length and sequence of the donor 
nucleic acid, together with the site(s) of cleavage, are selected to obtain the desired 
recombination event. In cases in which the desired event is designed to change the 

25 sequence of a single nucleotide pair in a genomic sequence, cellular chromatin is 
cleaved within 10,000 nucleotides on either side of that nucleotide pair. In certain 
embodiments, cleavage occurs within 500, 200, 100, 90, 80, 70, 60, 50, 40, 30, 20, 10, 
5, or 2 nucleotides, or any integral value between 2 and 1,000 nucleotides, on either 
side of the nucleotide pair whose sequence is to be changed. 

30 As detailed above, the binding sites for two fusion proteins, each comprising a 

zinc finger binding domain and a cleavage half-domain, can be located 5-8 or 15-18 
nucleotides apart, as measured from the edge of each binding site nearest the other 
binding site, and cleavage occurs between the binding sites. Whether cleavage occurs 
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at a single site or at multiple sites between the binding sites is immaterial, since the 
cleaved genomic sequences are replaced by the donor sequences. Thus, for efficient 
alteration of the sequence of a single nucleotide pair by targeted recombination, the 
midpoint of the region between the binding sites is within 10,000 nucleotides of that 
5 nucleotide pair, preferably within 1,000 nucleotides, or 500 nucleotides, or 200 
nucleotides, or 100 nucleotides, or 50 nucleotides, or 20 nucleotides, or 10 
nucleotides, or 5 nucleotide, or 2 nucleotides, or one nucleotide, or at the nucleotide 
pair of interest. 

In certain embodiments, a homologous chromosome can serve as the donor 

1 0 polynucleotide. Thus, for example, correction of a mutation in a heterozygote can be 
achieved by engineering fusion proteins which bind to and cleave the mutant 
sequence on one chromosome, but do not cleave the wild-type sequence on the 
homologous chromosome. The double-stranded break on the mutation-bearing 
chromosome stimulates a homology-based "gene conversion" process in which the 

15 wild-type sequence from the homologous chromosome is copied into the cleaved 
chromosome, thus restoring two copies of the wild-type sequence. 

Methods and compositions are also provided that may enhance levels of 
targeted recombination including, but not limited to, the use of additional ZFP- 
flinctional domain fusions to activate expression of genes involved in homologous 

20 recombination, such as, for example, members of the RAD52 epistasis group (e.g., ' 
Rad50, RadSl, RadSIB, Rad51C, Rad51D, Rad52, Rad54, Rad54B, Mrell, XRCC2, 
XRCC3), genes whose products interact with the aforementioned gene products (e.g., 
BRCA1, BRCA2) and/or genes in the NBS1 complex. Similarly ZFP-functional 
domain fusions can be used, in combination with the methods and compositions 

25 disclosed herein, to repress expression of genes involved in non-homologous end 
joining (e.g., Ku70/80, XRCC4, poly(ADP ribose) polymerase, DNA ligase 4). See, 
for example, Yanez et al (1998) Gene Therapy 5:149-159; Hoeijmakers (2001) 
Nature 411:366-374; Johnson etai (2001) Biochem. Soc. Trans. 29:196-201; 
Tauchi et al (2002) Oncogene 21:8967-8980. Methods for activation and repression 

30 of gene expression using fusions between a zinc finger binding domain and a 

functional domain are disclosed in co-owned US Patent No. 6,534,261 . Additional 
repression methods include the use of antisense oligonucleotides and/or small 
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interfering RNA (siRNA or RNAi) targeted to the sequence of the gene to be 
repressed. 

As an alternative to or, in addition to, activating expression of gene products 
involved in homologous recombination, fusions of these protein (or functional 
5 fragments thereof) with a zinc finger binding domain targeted to the region of interest, 
can be used to recruit these proteins (recombination proteins) to the region of interest, 
thereby increasing their local concentration and further stimulating homologous 
recombination processes. Alternatively, a polypeptide involved in homologous 
recombination as described above (or a functional fragment thereof) can be part of a 

1 0 triple fusion protein comprising a zinc finger binding domain, a cleavage domain (or 
cleavage half-domain) and the recombination protein (or functional fragment thereof). 
Additional proteins involved in gene conversion and recombination-related chromatin 
remodeling, which can be used in the aforementioned methods and compositions, 
include histone acetyltransferases (e.g., Esalp, Tip60), histone methyltransferases 

1 5 (e.g. , Dot 1 p), histone kinases and histone phosphatases. 

The p53 protein has been reported to play a central role in repressing 
homologous recombination (HR). See, for example, Valerie et aL, (2003) Oncogene 
22:5792-5812; Janz, et aL (2002) Oncogene 21:5929-5933. For example, the rate of 
HR in p53-deficient human tumor lines is 10,000-fold greater than in primary human 

20 fibroblasts, and there is a 100-fold increase in HR in tumor cells with a non-functional 
p53 compared to those with functional p53. Mekeel etal (1997) Oncogene 14:1847- 
1857. In addition, overexpression of p53 dominant negative mutants leads to a 20- 
fold increase in spontaneous recombination. Bertrand etal (1997) Oncogene 
14:1 1 17-1 122. Analysis of different p53 mutations has revealed that the roles of p53 

25 in transcriptional transactivation and G 1 cell cycle checkpoint control are separable 
from its involvement in HR. Saintigny et al. (1999) Oncogene 18:3553-3563; 
Boehden et aL (2003) Oncogene 22:41 11-4117. Accordingly, downregulation of p53 
activity can serve to increase the efficiency of targeted homologous recombination 
using the methods and compositions disclosed herein. Any method for 

30 downregulation of p53 activity can be used, including but not limited to 

cotransfection and overexpression of a p53 dominant negative mutant or targeted 
repression of p53 gene expression according to methods disclosed, e.g., in co-owned 
U.S. Patent No. 6,534,261. 
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Further increases in efficiency of targeted recombination, in cells comprising a 
zinc finger/nuclease fusion molecule and a donor DNA molecule, are achieved by 
blocking the cells in the G 2 phase of the cell cycle, when homotogy-driven repair 
processes are maximally active. Such arrest can be achieved in a number of ways. 
5 For example, cells can be treated with e.g., drugs, compounds and/or small molecules 
which influence cell-cycle progression so as to arrest cells in G2 phase. Exemplary 
molecules of this type include, but are not limited to, compounds which affect 
microtubule polymerization (e.g., vinblastine, nocodazole, Taxol), compounds that 
interact with DNA (e.g., cw-platinum(II) diamine dichloride, Cisplatin, doxorubicin) 

10 and/or compounds that affect DNA synthesis (e.g., thymidine, hydroxyurea, L- 
mimosine, etoposide, 5-fluorouracil). Additional increases in recombination 
efficiency are achieved by the use of histone deacetylase (HDAC) inhibitors (e.g., 
sodium butyrate, trichostatin A) which alter chromatin structure to make genomic 
DNA more accessible to the cellular recombination machinery. 

1 5 Additional methods for cell-cycle arrest include overexpression of proteins 

which inhibit the activity of the CDK cell-cycle kinases, for example, by introducing 
a cDNA encoding the protein into the cell or by introducing into the cell an 
engineered ZFP which activates expression of the gene encoding the protein. Cell- 
cycle arrest is also achieved by inhibiting the activity of cyclins and CDKs, for 

20 example, using RNAi methods (e.g., U.S. Patent No. 6,506,559) or by introducing 
into the cell an engineered ZFP which represses expression of one or more genes 
involved in cell-cycle progression such as, for example, cyclin and/or CDK genes. 
See, e.g., co- owned U.S. Patent No. 6,534,261 for methods for the synthesis of 
engineered zinc finger proteins for regulation of gene expression. 

25 Alternatively, in certain cases, targeted cleavage is conducted in the absence 

of a donor polynucleotide (preferably in S or G 2 phase), and recombination occurs 
between homologous chromosomes. 



Methods to screen for cellular factors that facilitate homologous 
30 recombination 

Since homologous recombination is a multi-step process requiring the 
modification of DNA ends and the recruitment of several cellular factors into a 
protein complex, the addition of one or more exogenous factors, along with donor 
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DMA and vectors encoding zinc finger-cleavage domain fusions, can be used to 
facilitate targeted homologous recombination. An exemplary method for identifying 
such a factor or factors employs analyses of gene expression using microarrays (e.g., 
Affymetrix Gene Chip® arrays) to compare the mRNA expression patterns of 
5 different cells. For example, cells that exhibit a higher capacity to stimulate double 
strand break-driven homologous recombination in the presence of donor DNA and 
zinc finger-cleavage domain fusions, either unaided or under conditions known to 
increase the level of gene correction, can be analyzed for their gene expression 
patterns compared to cells that lack such capacity. Genes that are upregulated or 

10 downregulated in a manner that directly correlates with increased levels of 

homologous recombination are thereby identified and can be cloned into any one of a 
number of expression vectors. These expression constructs can be co-transfected 
along with zinc finger-cleavage domain fusions and donor constructs to yield 
improved methods for achieving high-efficiency homologous recombination. 

1 5 Alternatively, expression of such genes can be appropriately regulated using 
engineered zinc finger roteins which modulate expression (either activation or 
repression) of one or more these genes. See, e.g., co- owned U.S. Patent No. 
6,534,261 for methods for the synthesis of engineered zinc finger proteins for 
regulation of gene expression. 

20 As an example, it was observed that the different clones obtained in the 

experiments described in Example 9 and Figure 27 exhibited a wide-range of 
homologous recombination frequencies, when transfected with donor DNA and 
plasm ids encoding zinc finger-cleavage domain fusions. Gene expression in clones 
showing a high frequency of targeted recombination can thus be compared to that in 

25 clones exhibiting a low frequency, and expression patterns unique to the former 
clones can be identified. 

As an additional example, studies using cell cycle inhibitors (e.g., nocodazole 
or vinblastine, see e.g., Examples 1 1, 14 and 15) showed that cells arrested in the G2 
phase of the cell cycle carried out homologous recombination at higher rates, 

30 indicating that cellular factors responsible for homologous recombination may be 
preferentially expressed or active in G2. One way to identify these factors is to 
compare the mRNA expression patterns between the stably transfected HEK 293 cell 
clones that carry out gene correction at high and low levels (e.g., clone Tl 8 vs. clone 
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T7). Similar comparisons are made between these cell lines in response to 
compounds that arrest the cells in G2 phase. Candidate genes that are differentially 
expressed in cells that carry out homologous recombination at a higher rate, either 
unaided or in response to compounds that arrest the cells in G2, are identified, cloned, 
5 and re-introduced into cells to determine whether their expression is sufficient to re- 
capitulate the improved rates. Alternatively, expression of said candidate genes is 
activated using engineered zinc finger transcription factors as described, for example, 
in co-owned U.S. Patent No. 6,534,261. 

1 0 Expression vectors 

A nucleic acid encoding one or more ZFPs or ZFP fusion proteins can be 
cloned into a vector for transformation into prokaryotic or eukaryotic cells for 
replication and/or expression. Vectors can be prokaryotic vectors, e.g., plasmids, or 
shuttle vectors, insect vectors, or eukaryotic vectors. A nucleic acid encoding a ZFP 
15 can also be cloned into an expression vector, for administration to a plant cell, animal 
cell, preferably a mammalian cell or a human cell, fungal cell, bacterial cell, or 
protozoal cell. 

To obtain expression of a cloned gene or nucleic acid, sequences encoding a 
ZFP or ZFP fusion protein are typically subcloned into an expression vector that 

20 contains a promoter to direct transcription. Suitable bacterial and eukaryotic 

promoters are well known in the art and described, e.g., in Sambrook et al., Molecular 
Cloning, A Laboratory Manual (2nd ed. 1989; 3 rd ed., 2001); Kriegler, Gene Transfer 
and Expression: A Laboratory Manual (1 990); and Current Protocols in Molecular 
Biology (Ausubel et a/., supra. Bacterial expression systems for expressing the ZFP 

25 are available in, e.g., E. coli, Bacillus sp., and Salmonella (Palva et aL t Gene 22:229- 
235 (1983)). Kits for such expression systems are commercially available. 
Eukaryotic expression systems for mammalian cells, yeast, and insect cells are well 
known by those of skill in the art and are also commercially available. 

The promoter used to direct expression of a ZFP-encoding nucleic acid 

30 depends on the particular application. For example, a strong constitutive promoter is 
typically used for expression and purification of ZFP. In contrast, when a ZFP is 
administered in vivo for gene regulation, either a constitutive or an inducible promoter 
is used, depending on the particular use of the ZFP. In addition, a preferred promoter 
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for administration of a ZFP can be a weak promoter, such as HSV TK or a promoter 
having similar activity. The promoter typically can also include elements that are 
responsive to transactivation, e.g., hypoxia response elements, Gal4 response 
elements, lac repressor response element, and small molecule control systems such as 
5 tet-regulated systems and the RU-486 system {see, e.g., Gossen & Bujard, PNAS 

89:5547 (1992); Oligino et at, Gene Ther. 5:491-496 (1998); Wang et ai, Gene Ther. 
4:432-441 (1997);Neeringeftf/., BloodU'A 147-1 155 (1996); and Rendahl etal., 
Nat. BiotechnoL 16:757-761 (1998)). The MNDU3 promoter can also be used, and is 
preferentially active in CD34 + hematopoietic stem cells. 

1 0 In addition to the promoter, the expression vector typically contains a 

transcription unit or expression cassette that contains all the additional elements 
required for the expression of the nucleic acid in host cells, either prokaryotic or 
eukaryotic. A typical expression cassette thus contains a promoter operably linked, 
e.g., to a nucleic acid sequence encoding the ZFP, and signals required, e.g., for 

1 5 efficient polyadenylation of the transcript, transcriptional termination, ribosome 
binding sites, or translation termination. Additional elements of the cassette may 
include, e.g., enhancers, and heterologous splicing signals. 

The particular expression vector used to transport the genetic information into 
the cell is selected with regard to the intended use of the ZFP, e.g., expression in 

20 plants, animals, bacteria, fungus, protozoa, etc. (see expression vectors described 
below). Standard bacterial expression vectors include plasmids such as pBR322- 
based plasmids, pSKF, pET23D, and commercially available fusion expression 
systems such as GST and LacZ. An exemplary fusion protein is the maltose binding 
protein, "MBP " Such fusion proteins are used for purification of the ZFP. Epitope 

25 tags can also be added to recombinant proteins to provide convenient methods of 
isolation, for monitoring expression, and for monitoring cellular and subcellular 
localization, e.g., c-myc or FLAG. 

Expression vectors containing regulatory elements from eukaryotic viruses are 
often used in eukaryotic expression vectors, e.g., SV40 vectors, papilloma virus 

30 vectors, and vectors derived from Epstein-Barr virus. Other exemplary eukaryotic 
vectors include pMSG, pAV009/A+, pMTO10/A+, pMAMneo-5, baculovirus 
pDS VE, and any other vector allowing expression of proteins under the direction of 
the SV40 early promoter, SV40 late promoter, metallothionein promoter, murine 
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mammary tumor virus promoter, Rous sarcoma virus promoter, polyhedrin promoter, 
or other promoters shown effective for expression in eukaryotic cells. 

Some expression systems have markers for selection of stably transfected cell 
lines such as thymidine kinase, hygromycin B phosphotransferase, and dihydrofolate 
5 reductase. High yield expression systems are also suitable, such as using a 

baculovirus vector in insect cells, with a ZFP encoding sequence under the direction 
of the polyhedrin promoter or other strong baculovirus promoters. 

The elements that are typically included in expression vectors also include a 
replicon that functions in E. coli, a gene encoding antibiotic resistance to permit 
10 selection of bacteria that harbor recombinant plasmids, and unique restriction sites in 
nonessential regions of the plasmid to allow insertion of recombinant sequences. 

Standard transfection methods are used to produce bacterial, mammalian, 
yeast or insect cell lines that express large quantities of protein, which are then 
purified using standard techniques (see, e.g., Colley et al> J. Biol. Chem. 264:17619- 
15 17622 (1989); Guide to Protein Purification, in Methods in Enzymology, vol. 182 
(Deutscher, ed., 1990)). Transformation of eukaryotic and prokaryotic cells are 
performed according to standard techniques (see, e.g., Morrison, J. Bact, 132:349-35 1 
(1977); Clark-Curtiss & Curtiss, Methods in Enzymology 101 :347-362 (Wu et al. 9 eds, 
1983). 

20 Any of the well known procedures for introducing foreign nucleotide 

sequences into host cells may be used. These include the use of calcium phosphate 
transfection, polybrene, protoplast fusion, electroporation, ultrasonic methods (e.g., 
sonoporation), liposomes, microinjection, naked DNA, plasmid vectors, viral vectors, 
both episomal and integrative, and any of the other well known methods for 

25 introducing cloned genomic DNA, cDNA, synthetic DNA or other foreign genetic 
material into a host cell (.see, e.g., Sambrook et al., supra). It is only necessary that 
the particular genetic engineering procedure used be capable of successfully 
introducing at least one gene into the host cell capable of expressing the protein of 
choice. 

30 

Nucleic acids encoding fusion proteins and delivery to cells 

Conventional viral and non-viral based gene transfer methods can be used to 
introduce nucleic acids encoding engineered ZFPs in cells (e.g., mammalian cells) and 
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target tissues. Such methods can also be used to administer nucleic acids encoding 
ZFPs to cells in vitro. In certain embodiments, nucleic acids encoding ZFPs are 
administered for in vivo or ex vivo gene therapy uses. Non-viral vector delivery 
systems include DNA plasmids, naked nucleic acid, and nucleic acid complexed with 
5 a delivery vehicle such as a liposome or poloxamer. Viral vector delivery systems 
include DNA and RNA viruses, which have either episomal or integrated genomes 
after delivery to the cell. For a review of gene therapy procedures, see Anderson, 
Science 256:808-813 (1992); Nabel & Feigner, TIBTECH 1 1:21 1-217 (1993); Mitani 
&Caskey, TIBTECH 11:162-166 (1993); Dillon, TIBTECH 11:167-175 (1993); 

10 Miller, Nature 357:455-460 (1992); Van Brunt, Biotechnology 6(10):1 149-1 154 
(1988); Vigne, Restorative Neurology and Neuroscie nee 8:35-36 (1995); Kremer & 
Perricaudet, British Medical Bulletin 51(l):31-44 (1995); Haddada et al, in Current 
Topics in Microbiology and Immunology Doerfler and Btthm (eds) (1995); and Yu et 
al, Gene Therapy 1:13-26 (1994). 

15 Methods of non-viral delivery of nucleic acids encoding engineered ZFPs 

include electroporation, lipofection, microinjection, biolistics, virosomes, liposomes, 
immunoliposomes, polycation or lipid:nucleic acid conjugates, naked DNA, artificial 
virions, and agent-enhanced uptake of DNA. Sonoporation using, e.g., the Sonitron 
2000 system (Rich-Mar) can also be used for delivery of nucleic acids. 

20 Additional exemplary nucleic acid delivery systems include those provided by 

Amaxa Biosystems (Cologne, Germany), Maxcyte, Inc. (Rockville, Maryland) and 
BTX Molecular Delivery Systems (Holliston, MA). 

Lipofection is described in e.g., US 5,049,386, US 4,946,787; and US 
4,897,355) and lipofection reagents are sold commercially (e.g., Transfectam™ and 

25 Lipofectin™). Cationic and neutral lipids that are suitable for efficient receptor- 
recognition lipofection of polynucleotides include those of Feigner, WO 91/17424, 
WO 91/16024. Delivery can be to cells {ex vivo administration) or target tissues (in 
vivo administration). 

The preparation of lipid:nucleic acid complexes, including targeted liposomes 

30 such as immunolipid complexes, is well known to one of skill in the art (see, e.g., 
Crystal, Science 270:404-410 (1995); Blaese et al. t Cancer Gene Ther. 2:291-297 
(1995); Behr et al, Bioconjugate Chem. 5:382-389 (1994); Remy et al., Bioconjugate 
Chem. 5:647-654 (1994); Gao et al, Gene Therapy 2:710-722 (1995); Ahmad et al, 
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Cancer Res. 52:4817-4820 (1992); U.S. Pat Nos. 4,186,183,4,217,344, 4,235,871, 
4,261,975, 4,485,054, 4,501,728, 4,774,085, 4,837,028, and 4,946,787). 

The use of RNA or DNA viral based systems for the delivery of nucleic acids 
encoding engineered ZFPs take advantage of highly evolved processes for targeting a 
5 virus to specific cells in the body and trafficking the viral payload to the nucleus. 
Viral vectors can be administered directly to patients (in vivo) or they can be used to 
treat cells in vitro and the modified cells are administered to patients (ex vivo). 
Conventional viral based systems for the delivery of ZFPs include, but are not limited 
to, retroviral, lentivirus, adenoviral, adeno-associated, vaccinia and herpes simplex 
10 virus vectors for gene transfer. Integration in the host genome is possible with the 
retrovirus, lentivirus, and adeno-associated virus gene transfer methods, often 
resulting in long term expression of the inserted transgene. Additionally, high 
transduction efficiencies have been observed in many different cell types and target 
tissues. 

1 5 The tropism of a retrovirus can be altered by incorporating foreign envelope 

proteins, expanding the potential target population of target cells. Lentiviral vectors 
are retroviral vectors that are able to transduce or infect non-dividing cells and 
typically produce high viral titers. Selection of a retroviral gene transfer system 
depends on the target tissue. Retroviral vectors are comprised of m-acting long 

20 terminal repeats with packaging capacity for up to 6-10 kb of foreign sequence. The 
minimum cw-acting LTRs are sufficient for replication and packaging of the vectors, 
which are then used to integrate the therapeutic gene into the target cell to provide 
permanent transgene expression. Widely used retroviral vectors include those based 
upon murine leukemia virus (MuLV), gibbon ape leukemia virus (GaLV), Simian 

25 Immunodeficiency virus (SIV), human immunodeficiency virus (HIV), and 

combinations thereof (see, e.g., Buchscher et al, J. Virol 66:273 1-2739 (1992); 
Johann et al, J. Virol 66:1635-1640(1992); Sommerfeltef al, Virol. 176:58-59 
(1990); Wilson etalj. Virol 63:2374-2378 (1989); Miller J. Virol 65:2220- 
2224 (1991); PCT/US94/05700). 

30 In applications in which transient expression of a ZFP fusion protein is 

preferred, adenoviral based systems can be used. Adenoviral based vectors are 
capable of very high transduction efficiency in many cell types and do not require cell 
division. With such vectors, high titer and high levels of expression have been 
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obtained. This vector can be produced in large quantities in a relatively simple 
system. Adeno-associated virus ("AAV") vectors are also used to transduce cells 
with target nucleic acids, e.g., in the in vitro production of nucleic acids and peptides, 
and for in vivo and ex vivo gene therapy procedures (see, e.g., West et al, Virology 
5 160:38-47 (1987); U.S. Patent No. 4,797,368; W093/24641; Kotin, Human Gene 
Therapy 5:793-801 (1994); Muzyczka, J. Clin. Invest. 94:1351 (1994). Construction 
of recombinant AAV vectors are described in a number of publications, including 
U.S. Pat. No. 5,173,414; Tratschine/ a/., Mol Cell Biol. 5:3251-3260 (1985); 
Tratschin, et al, Mol Cell. Biol 4:2072-2081 (1984); Hermonat & Muzyczka, PNAS 
10 81:6466-6470 (1984); and Samulski et al 9 J. Virol 63:03822-3828 (1989). 

At least six viral vector approaches are currently available for gene transfer in 
clinical trials, which utilize approaches that involve complementation of defective 
vectors by genes inserted into helper cell lines to generate the transducing agent. 

pLASN and MFG-S are examples of retroviral vectors that have been used in 
15 clinical trials (Dunbar et al, Blood 85:3048-305 (1995); Kohn et al, Nat. Med. 

1 : 1 0 1 7-1 02 (1 995); Malech et al, PNAS 94:22 12133-12138(1 997)). PA3 1 7/pLASN 
was the first therapeutic vector used in a gene therapy trial. (Blaese et al, Science 
270:475-480 (1995)). Transduction efficiencies of 50% or greater have been 
observed for MFG-S packaged vectors. (EUem etal, Immunol Immunother. 44(1):10- 
20 20 (1997); DranofTe/ al, Hum. Gene Then 1:11 1-2 (1997). 

Recombinant adeno-associated virus vectors (rAAV) are a promising 
alternative gene delivery systems based on the defective and nonpathogenic 
parvovirus adeno-associated type 2 virus. All vectors are derived from a plasmid that 
retains only the AAV 145 bp inverted terminal repeats flanking the transgene 
25 expression cassette. Efficient gene transfer and stable transgene delivery due to 
integration into the genomes of the transduced cell are key features for this vector 
system. (Wagner et al, Lancet 351:9117 1702-3 (1998), Kearns et al, Gene Ther. 
9:748-55 (1996)). 

Replication-deficient recombinant adenoviral vectors (Ad) can be produced at 
30 high titer and readily infect a number of different cell types. Most adenovirus vectors 
are engineered such that a transgene replaces the Ad El a, El b, and/or E3 genes; 
subsequently the replication defective vector is propagated in human 293 cells that 
supply deleted gene function in trans. Ad vectors can transduce multiple types of 
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tissues in vivo, including nondividing, differentiated cells such as those found in liver, 
kidney and muscle. Conventional Ad vectors have a large carrying capacity. An 
example of the use of an Ad vector in a clinical trial involved polynucleotide therapy 
for antitumor immunization with intramuscular injection (Sterman et al. t Hum. Gene 
5 Ther. 7: 1083-9 (1998)). Additional examples of the use of adenovirus vectors for 
gene transfer in clinical trials include Rosenecker et al t Infection 24:1 5-10 (1996); 
Sterman et ah, Hum. Gene Ther. 9:7 1083-1089 (1998); Welsh et a!., Hum. Gene 
Ther. 2:205-18 (1995); Alvarez et al. t Hum. Gene Ther. 5:597-613 (1997); Topf et al. t 
Gene Ther. 5:507-513 (1998); Sterman etal., Hum. Gene Ther. 7:1083-1089 (1998). 

1 0 Packaging cells are used to form virus particles that are capable of infecting a 

host cell. Such cells include 293 cells, which package adenovirus, and \|/2 cells or 
PA317 cells, which package retrovirus. Viral vectors used in gene therapy are usually 
generated by a producer cell line that packages a nucleic acid vector into a viral 
particle. The vectors typically contain the minimal viral sequences required for 

1 5 packaging and subsequent integration into a host (if applicable), other viral sequences 
being replaced by an expression cassette encoding the protein to be expressed. The 
missing viral functions are supplied in trans by the packaging cell line. For example, 
AAV vectors used in gene therapy typically only possess inverted terminal repeat 
(1TR) sequences from the AAV genome which are required for packaging and 

20 integration into the host genome. Viral DNA is packaged in a cell line, which 

contains a helper plasmid encoding the other AAV genes, namely rep and cap, but 
lacking ITR sequences. The cell line is also infected with adenovirus as a helper. The 
helper virus promotes replication of the AAV vector and expression of AAV genes 
from the helper plasmid. The helper plasmid is not packaged in significant amounts 

25 due to a lack of ITR sequences. Contamination with adenovirus can be reduced by, 
e.g., heat treatment to which adenovirus is more sensitive than AAV. 

In many gene therapy applications, it is desirable that the gene therapy vector 
be delivered with a high degree of specificity to a particular tissue type. Accordingly, 
a viral vector can be modified to have specificity for a given cell type by expressing a 

30 ligand as a fusion protein with a viral coat protein on the outer surface of the virus. 
The ligand is chosen to have affinity for a receptor known to be present on the cell 
type of interest. For example, Han et aL, Proc. Natl Acad. Sci. USA 92:9747-9751 
(1995), reported that Moloney murine leukemia virus can be modified to express 
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human heregulin fused to gp70, and the recombinant virus infects certain human 
breast cancer cells expressing human epidermal growth factor receptor. This principle 
can be extended to other virus-target cell pairs, in which the target cell expresses a 
receptor and the virus expresses a fusion protein comprising a ligand for the cell- 
5 surface receptor. For example, filamentous phage can be engineered to display 

antibody fragments (e.g., FAB or Fv) having specific binding affinity for virtually any 
chosen cellular receptor. Although the above description applies primarily to viral 
vectors, the same principles can be applied to nonviral vectors. Such vectors can be 
engineered to contain specific uptake sequences which favor uptake by specific target 
10 cells. 

Gene therapy vectors can be delivered in vivo by administration to an 
individual patient, typically by systemic administration (e.g., intravenous, 
intraperitoneal, intramuscular, subdermal, or intracranial infusion) or topical 
application, as described below. Alternatively, vectors can be delivered to cells ex 

1 5 vivo, such as cells explanted from an individual patient (e.g., lymphocytes, bone 
marrow aspirates, tissue biopsy) or universal donor hematopoietic stem cells, 
followed by reimplantation of the cells into a patient, usually after selection for cells 
which have incorporated the vector. 

Ex vivo cell transfection for diagnostics, research, or for gene therapy (e.g., via 

20 re-infusion of the transfected cells into the host organism) is well known to those of 
skill in the art. In a preferred embodiment, cells are isolated from the subject 
organism, transfected with a ZFP nucleic acid (gene or cDNA), and re-infused back 
into the subject organism (e.g., patient). Various cell types suitable for ex vivo 
transfection are well known to those of skill in the art (see, e.g., Freshney et al, 

25 Culture of Animal Cells, A Manual of Basic Technique (3rd ed. 1 994)) and the 
references cited therein for a discussion of how to isolate and culture cells from 
patients). 

In one embodiment, stem cells are used in ex vivo procedures for cell 
transfection and gene therapy. The advantage to using stem cells is that they can be 
30 differentiated into other cell types in vitro, or can be introduced into a mammal (such 
as the donor of the ceils) where they will engraft in the bone marrow. Methods for 
differentiating CD34+ cells in vitro into clinically important immune cell types using 
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cytokines such a GM-CSF, IFN-y and TNF-ct are known (see Inaba et aL, J. Exp. 
Med 176:1693-1702(1992)). 

Stem cells are isolated for transduction and differentiation using known 
methods. For example, stem cells are isolated from bone marrow cells by panning the 
5 bone marrow cells with antibodies which bind unwanted cells, such as CD4+ and 
CD8+ (T cells), CD45+ (panB cells), GR-1 (granulocytes), and lad (differentiated 
antigen presenting cells) (see Inaba et al t J. Exp. Med. 176:1693-1702 (1992)). 

Vectors (e.g., retroviruses, adenoviruses, liposomes, etc.) containing 
therapeutic ZFP nucleic acids can also be administered directly to an organism for 

10 transduction of cells in vivo. Alternatively, naked DNA can be administered. 

Administration is by any of the routes normally used for introducing a molecule into 
ultimate contact with blood or tissue cells including, but not limited to, injection, 
infusion, topical application and electroporation. Suitable methods of administering 
such nucleic acids are available and well known to those of skill in the art, and, 

1 5 although more than one route can be used to administer a particular composition, a 
particular route can often provide a more immediate and more effective reaction than 
another route. 

Methods for introduction of DNA into hematopoietic stem cells are disclosed, 
for example, in U.S. Patent No. 5,928,638. 

20 Pharmaceutically acceptable carriers are determined in part by the particular 

composition being administered, as well as by the particular method used to 
administer the composition. Accordingly^ there is a wide variety of suitable 
formulations of pharmaceutical compositions available, as described below (see, e.g., 
Remington's Pharmaceutical Sciences, 17th ed., 1989). 

25 DNA constructs may be introduced into the genome of a desired plant host by 

a variety of conventional techniques. For reviews of such techniques see, for 
example, Weissbach & Weissbach Methods for Plant Molecular Biology (1988, 
Academic Press, N.Y.) Section VIII, pp. 421-463; and Grierson & Corey, Plant 
Molecular Biology (1 988, 2d Ed.), Blackie, London, Ch. 7-9. For example, the DNA 

30 construct may be introduced directly into the genomic DNA of the plant cell using 
techniques such as electroporation and microinjection of plant cell protoplasts, or the 
DNA constructs can be introduced directly to plant tissue using biolistic methods, 
such as DNA particle bombardment (see, e.g., Klein et al (1987) Nature 327:70-73). 
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Alternatively, the DNA constructs may be combined with suitable T-DNA flanking 
regions and introduced into a conventional Agrobacterium (umefaciens host vector. 
Agrobacteriutn tumefaciens-mediated transformation techniques, including disarming 
and use of binary vectors, are well described in the scientific literature. See, for 
5 example Horsch et al (1984) Science 233:496-498, and Fraley et al (1983) Proc. Nat'!. 
Acad. Sci. USA 80:4803. The virulence functions of the Agrobacteriutn tumefaciens 
host will direct the insertion of the construct and adjacent marker into the plant cell 
DNA when the cell is infected by the bacteria using binary T DNA vector (Bevan 

(1984) Nuc. Acid Res. 12:871 1-8721) or the co-cultivation procedure (Horsch et al 

10 (1985) Science 227:1229-1231). Generally, the Agrobacterium transformation system 
is used to engineer dicotyledonous plants (Bevan et al (1982) Ann. Rev, Genet 
16:357-384; Rogers et al (1986) Methods EnzymoL 1 18:627-641). The 
Agrobacterium transformation system may also be used to transform, as well as 
transfer, DNA to monocotyledonous plants and plant cells. See Hernalsteen et al 

15 (1984) EMBOJ 3:3039-3041; Hooykass-Van Slogteren et al (\9S4) Nature 

311 :763-764; Grimsley et al (\9Z1) Nature 325:1677-179; Boulton et al (1989) Plant 
Mol. Biol 12:31-40.; and Gould et al {\99\)Plant Physiol 95:426-434. 

Alternative gene transfer and transformation methods include, but are not 
limited to, protoplast transformation through calcium-, polyethylene glycol (PEG)- or 

20 electroporation-mediated uptake of naked DNA (see Paszkowski et al. (1 984) EMBO 
73:2717-2722, Potrykus et al. (1985) Molec. Gen. Genet. 199:169-177; Fromm et al. 

(1985) Proc. Nat. Acad Sci. USA 82:5824-5828; and Shimamoto (1989) Nature 
338:274-276) and electroporation of plant tissues (D'Halluin et al. (1992) Plant Cell 
4:1495-1505). Additional methods for plant cell transformation include 

25 microinjection, silicon carbide mediated DNA uptake (Kaeppler et al. ( 1 990) Plant 
Cell Reporter 9:415-418), and microprojectile bombardment (see Klein et al. (1988) 
Proc. Nat. Acad. Sci. USA 85:4305-4309; and Gordon-Kamm et al. (1990) Plant Cell 
2:603-618). 

The disclosed methods and compositions can be used to insert exogenous 
30 sequences into a predetermined location in a plant cell genome. This is useful 
inasmuch as expression of an introduced transgene into a plant genome depends 
critically on its integration site. Accordingly, genes encoding, e.g., nutrients, 
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antibiotics or therapeutic molecules can be inserted, by targeted recombination, into 
regions of a plant genome favorable to their expression. 

Transformed plant cells which are produced by any of the above 
transformation techniques can be cultured to regenerate a whole plant which 
5 possesses the transformed genotype and thus the desired phenotype. Such 

regeneration techniques rely on manipulation of certain phytohormones in a tissue 
culture growth medium, typically relying on a biocide and/or herbicide marker which 
has been introduced together with the desired nucleotide sequences. Plant 
regeneration from cultured protoplasts is described in Evans, et al., "Protoplasts 

1 0 Isolation and Culture" in Handbook of Plant Cell Culture, pp. 1 24- 1 76, Macmillian 
Publishing Company, New York, 1983; and Binding, Regeneration of Plants, Plant 
Protoplasts, pp. 21-73, CRC Press, Boca Raton, 1985. Regeneration can also be 
obtained from plant callus, explants, organs, pollens, embryos or parts thereof. Such 
regeneration techniques are described generally in Klee et al (1987) Ann. Rev. of 

1 5 Plant Phys. 38:467-486. 

Nucleic acids introduced into a plant cell can be used to confer desired traits 
on essentially any plant. A wide variety of plants and plant cell systems may be 
engineered for the desired physiological and agronomic characteristics described 
herein using the nucleic acid constructs of the present disclosure and the various 

20 transformation methods mentioned above. In preferred embodiments, target plants 
and plant cells for engineering include, but are not limited to, those 
monocotyledonous and dicotyledonous plants, such as crops including grain crops 
(e.g., wheat, maize, rice, millet, barley), fruit crops (e.g., tomato, apple, pear, 
strawberry, orange), forage crops (e.g., alfalfa), root vegetable crops (e.g., carrot, 

25 potato, sugar beets, yam), leafy vegetable crops (e.g., lettuce, spinach); flowering 
plants (e.g., petunia, rose, chrysanthemum), conifers and pine trees (e.g., pine fir, 
spruce); plants used in phytoremediation (e.g., heavy metal accumulating plants); oil 
crops (e.g., sunflower, rape seed) and plants used for experimental purposes (e.g., 
Arabidopsis). Thus, the disclosed methods and compositions have use over a broad 

30 range of plants, including, but not limited to, species from the genera Asparagus, 

Avena, Brassica, Citrus, Citrullus, Capsicum, Cucurbita, Daucus, Glycine, Hordeum, 
Lactuca, Lycopersicon, Malus, Manihot, Nicotiana, Oryza, Persea, Pisum, Pyrus, 
Prunus, Raphanus, Secale, Solanum, Sorghum, Triticum, Vitis, Vigna, and Zea. 
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One of skill in the art will recognize that after the expression cassette is stably 
incorporated in transgenic plants and confirmed to be operable, it can be introduced 
into other plants by sexual crossing. Any of a number of standard breeding 
techniques can be used, depending upon the species to be crossed. 
5 A transformed plant cell, callus, tissue or plant may be identified and isolated 

by selecting or screening the engineered plant material for traits encoded by the 
marker genes present on the transforming DNA. For instance, selection may be 
performed by growing the engineered plant material on media containing an 
inhibitory amount of the antibiotic or herbicide to which the transforming gene 

10 construct confers resistance. Further, transformed plants and plant cells may also be 
identified by screening for the activities of any visible marker genes (e.g., the 
p-glucuronidase, luciferase, B or CI genes) that may be present on the recombinant 
nucleic acid constructs. Such selection and screening methodologies are well known 
to those skilled in the art. 

1 5 Physical and biochemical methods also may be used to identify plant or plant 

cell transformants containing inserted gene constructs. These methods include but are 
not limited to: 1) Southern analysis or PCR amplification for detecting and 
determining the structure of the recombinant DNA insert; 2) Northern blot, SI RNase 
protection, primer-extension or reverse transcriptase-PCR amplification for detecting 

20 and examining RNA transcripts of the gene constructs; 3) enzymatic assays for 

detecting enzyme or ribozyme activity, where such gene products are encoded by the 
gene construct; 4) protein gel electrophoresis, Western blot techniques, 
immunoprecipitation, or enzyme-linked immunoassays, where the gene construct 
products are proteins. Additional techniques, such as in situ hybridization, enzyme 

25 staining, and immunostaining, also may be used to detect the presence or expression 
of the recombinant construct in specific plant organs and tissues. The methods for 
doing all these assays are well known to those skilled in the art. 

Effects of gene manipulation using the methods disclosed herein can be 
observed by, for example, northern blots of the RNA (e.g., mRNA) isolated from the 

30 tissues of interest. Typically, if the amount of mRNA has increased, it can be 

assumed that the corresponding endogenous gene is being expressed at a greater rate 
than before. Other methods of measuring gene and/or CYP74B activity can be used. 
Different types of enzymatic assays can be used, depending on the substrate used and 
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the method of detecting the increase or decrease of a reaction product or by-product. 
In addition, the levels of and/or CYP74B protein expressed can be measured 
immunochemical ly, i.e., ELISA, RIA, EIA and other antibody based assays well 
known to those of skill in the art, such as by electrophoretic detection assays (either 

5 with staining or western blotting). The transgene may be selectively expressed in 
some tissues of the plant or at some developmental stages, or the transgene may be 
expressed in substantially all plant tissues, substantially along its entire life cycle. 
However, any combinatorial expression mode is also applicable. 

The present disclosure also encompasses seeds of the transgenic plants 

10 described above wherein the seed has the transgene or gene construct. The present 
disclosure further encompasses the progeny, clones, cell lines or cells of the 
transgenic plants described above wherein said progeny, clone, cell line or cell has the 
transgene or gene construct. 

1 5 Delivery vehicles 

An important factor in the administration of polypeptide compounds, such as 
ZFP fusion proteins, is ensuring that the polypeptide has the ability to traverse the 
plasma membrane of a cell, or the membrane of an intra-cellular compartment such as 
the nucleus. Cellular membranes are composed of lipid-protein bilayers that are 
20 freely permeable to small, nonionic lipophilic compounds and are inherently 

impermeable to polar compounds, macromolecules, and therapeutic or diagnostic 
agents. However, proteins and other compounds such as liposomes have been 
described, which have the ability to translocate polypeptides such as ZFPs across a 
cell membrane. 

25 For example, "membrane translocation polypeptides" have amphiphilic or 

hydrophobic amino acid subsequences that have the ability to act as membrane- 
translocating carriers. In one embodiment, homeodomain proteins have the ability to 
translocate across cell membranes. The shortest internalizable peptide of a 
homeodomain protein, Antennapedia, was found to be the third helix of the protein, 

30 from amino acid position 43 to 58 (see, e.g., Prochiantz, Current Opinion in 

Neurobiology 6:629-634 (1996)). Another subsequence, the h (hydrophobic) domain 
of signal peptides, was found to have similar cell membrane translocation 
characteristics (see, e.g., Lin etal % J. Biol Chem. 270:1 4255-14258 (1995)). 
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Examples of peptide sequences which can be linked to a protein, for 
facilitating uptake of the protein into cells, include, but are not limited to: an 1 1 amino 
acid peptide of the tat protein of HTV; a 20 residue peptide sequence which 
corresponds to amino acids 84-103 of the pi 6 protein (see Fahraeus et al, Current 
Biology 6:84 (1996)); the third helix of the 60-amino acid long homeodomain of 
Antennapedia (Derossi et al, J. Biol. Chem. 269:10444 (1994)); the h region of a 
signal peptide such as the Kaposi fibroblast growth factor (K-FGF) h region (Lin et 
al. supra); or the VP22 translocation domain from HSV (Elliot & O'Hare, Cell 
88:223-233 (1997)). Other suitable chemical moieties that provide enhanced cellular 
uptake may also be chemically linked to ZFPs. Membrane translocation domains 
(i.e., internalization domains) can also be selected from libraries of randomized 
peptide sequences. See, for example, Yeh et al. (2003) Molecular Therapy 7(5):S461, 
Abstract #1191. 

Toxin molecules also have the ability to transport polypeptides across cell 
membranes. Often, such molecules (called "binary toxins") are composed of at least 
two parts: a translocation/binding domain or polypeptide and a separate toxin domain 
or polypeptide. Typically, the translocation domain or polypeptide binds to a cellular 
receptor, and then the toxin is transported into the cell. Several bacterial toxins, 
including Clostridium perfringens iota toxin, diphtheria toxin (DT), Pseudomonas 
exotoxin A (PE), pertussis toxin (PT), Bacillus anthracis toxin, and pertussis 
adenylate cyclase (CYA), have been used to deliver peptides to the cell cytosol as 
internal or amino-terminal fusions (Arora et al, J. Biol Chem. y 268:3334-3341 
(1993); Perelle et al, Infect, lmmun., 61:5147-5156 (1993); Stenmarkef al, J. Cell 
Biol. 1 13:1025-1032 (1991); Donnelly et al, PNAS 90:3530-3534 (1993); Carbonetti 
et al, Abstr. Annu. Meet. Am. Soc. Microbiol. 95:295 (1995); Sebo et al, Infect. 
Immun. 63:3851-3857 (1995); Klimpel et al, PNAS U.S.A. 89:10277-10281 (1992); 
and Novak etal t J. Biol Chem. 267:17186-17193 1992)). 

Such peptide sequences can be used to translocate ZFPs across a cell 
membrane. ZFPs can be conveniently fused to or derivatized with such sequences. 
Typically, the translocation sequence is provided as part of a fusion protein. 
Optionally, a linker can be used to link the ZFP and the translocation sequence. Any 
suitable linker can be used, e.g., a peptide linker. 
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The ZFP can also be introduced into an animal cell, preferably a mammalian 
cell, via a liposomes and liposome derivatives such as immunoliposomes. The term 
"liposome" refers to vesicles comprised of one or more concentrically ordered lipid 
bilayers, which encapsulate an aqueous phase. The aqueous phase typically contains 
5 the compound to be delivered to the cell, i.e., a ZFP. 

The liposome fuses with the plasma membrane, thereby releasing the drug into 
the cytosol. Alternatively, the liposome is phagocytosed or taken up by the cell in a 
transport vesicle. Once in the endosorne or phagosome, the liposome either degrades 
or ftises with the membrane of the transport vesicle and releases its contents. 

10 In current methods of drug delivery via liposomes, the liposome ultimately 

becomes permeable and releases the encapsulated compound (in this case, a ZFP) at 
the target tissue or cell. For systemic or tissue specific delivery, this can be 
accomplished, for example, in a passive manner wherein the liposome bilayer 
degrades over time through the action of various agents in the body. Alternatively, 

1 5 active drug release involves using an agent to induce a permeability change in the 
liposome vesicle. Liposome membranes can be constructed so that they become 
destabilized when the environment becomes acidic near the liposome membrane (see, 
e.g., PNAS 84:7851 (1987); Biochemistry 28:908 (1989)). When liposomes are 
endocytosed by a target cell, for example, they become destabilized and release their 

20 contents. This destabilization is termed fusogenesis. 

Dioleoylphosphatidylethanolamine (DOPE) is the basis of many "fusogenic" systems. 

Such liposomes typically comprise a ZFP and a lipid component, e.g., a 
neutral and/or cationic lipid, optionally including a receptor-recognition molecule 
such as an antibody that binds to a predetermined cell surface receptor or ligand (e.g., 

25 an antigen). A variety of methods are available for preparing liposomes as described 
in, e.g., Szoka et al, Ann. Rev. Biophys. Bioeng. 9:467 (1980), U.S. Pat. Nos. 
4,186,183, 4,217,344, 4,235,871, 4,261,975, 4,485,054, 4,501,728, 4,774,085, 
4,837,028, 4,235,871, 4,261,975, 4,485,054, 4,501,728, 4,774,085, 4,837,028, 
4,946,787, PCT Publication No. WO 91\1 7424, Deamer & Bangham, Biochim. 

30 Biophys. Acta 443:629-634 (1976); Fraley, et al, PNAS 76:3348-3352 (1979); Hope 
et al, Biochim. Biophys. Acta 812:55-65 (1985); Mayer et al, Biochim. Biophys. Acta 
858:161-168 (1986); Williams et al, PNAS 85:242-246 (1988); Liposomes (Ostro 
(ed.), 1983, Chapter I); Hope etal, Chem. Phys. Lip. 40:89 (1986); Gregoriadis, 
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Liposome Technology (1984) and Lasic, Liposomes: from Physics to Applications 
(1 993)). Suitable methods include, for example, sonication, extrusion, high 
pressure/homogenization, microfluidization, detergent dialysis, calcium-induced 
fusion of small liposome vesicles and ether-fusion methods, all of which are known to 
5 those of skill in the art. 

In certain embodiments, it is desirable to target liposomes using targeting 
moieties that are specific to a particular cell type, tissue, and the like. Targeting of 
liposomes using a variety of targeting moieties (e.g., ligands, receptors, and 
monoclonal antibodies) has been described. See, e.g., U.S. Patent Nos. 4,957,773 and 
10 4,603,044. 

Examples of targeting moieties include monoclonal antibodies specific to 
antigens associated with neoplasms, such as prostate cancer specific antigen and 
MAGE. Tumors can also be diagnosed by detecting gene products resulting from the 
activation or over-expression of oncogenes, such as ras or c-erbB2. In addition, many 

1 5 tumors express antigens normally expressed by fetal tissue, such as the 

alphafetoprotein (AFP) and carcinoembryonic antigen (CEA). Sites of viral infection 
can be diagnosed using various viral antigens such as hepatitis B core and surface 
antigens (HBVc, HBVs) hepatitis C antigens, Epstein-Barr virus antigens, human 
immunodeficiency type-1 virus (HIV1) and papilloma virus antigens. Inflammation 

20 can be detected using molecules specifically recognized by surface molecules which 
are expressed at sites of inflammation such as integrins (e.g., VCAM-1), selectin 
receptors (e.g. , ELAM- 1 ) and the 1 ike. 

Standard methods for coupling targeting agents to liposomes can be used. 
These methods generally involve incorporation into liposomes of lipid components, 

25 e.g., phosphatidylethanolamine, which can be activated for attachment of targeting 
agents, or derivatized lipophilic compounds, such as lipid derivatized bleomycin. 
Antibody targeted liposomes can be constructed using, for instance, liposomes which 
incorporate protein A {see Renneisen et al. y J. Biol. Chem., 265:16337-16342 (1990) 
and Leonetti et ai, PNAS 87:2448-245 1 (1 990). 

30 

Dosages 

For therapeutic applications, the dose administered to a patient, or to a cell 
which will be introduced into a patient, in the context of the present disclosure, should 
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be sufficient to effect a beneficial therapeutic response in the patient over time. In 
addition, particular dosage regimens can be useful for determining phenotypic 
changes in an experimental setting, e.g., in functional genomics studies, and in cell or 
animal models. The dose will be determined by the efficacy and K<j of the particular 
5 ZFP employed, the nuclear volume of the target cell, and the condition of the patient, 
as well as the body weight or surface area of the patient to be treated. The size of the 
dose also will be determined by the existence, nature, and extent of any adverse side- 
effects that accompany the administration of a particular compound or vector in a 
particular patient. 

1 0 The maximum therapeutically effective dosage of ZFP for approximately 99% 

binding to target sites is calculated to be in the range of less than about 1.5xl0 5 to 
1.5xl0 6 copies of the specific ZFP molecule per cell. The number of ZFPs per cell for 
this level of binding is calculated as follows, using the volume of a HeLa cell nucleus 
(approximately 1000 ^im 3 or 10" 12 L; Cell Biology, (Altman & Katz, eds. (1976)). As 

1 5 the HeLa nucleus is relatively large, this dosage number is recalculated as needed 
using the volume of the target cell nucleus. This calculation also does not take into 
account competition for ZFP binding by other sites. This calculation also assumes 
that essentially all of the ZFP is localized to the nucleus. A value of lOOx K<j is used 
to calculate approximately 99% binding of to the target site, and a value of lOx IQ is 

20 used to calculate approximately 90% binding of to the target site. For this example, 
= 25 nM 

ZFP + target site <-> complex 
i.e., DNA + protein DNA:protein complex 
K d = rDNAI Tprotein) 
25 [DNArprotein complex] 

When 50% of ZFP is bound, K<j = [protein] 

So when [protein] = 25 nM and the nucleus volume is 10" 12 L 

[protein] - (25xl0 9 moles/L) (10" 12 L/nucIeus) (6X10 23 

molecules/mole) 

30 = 15,000 molecules/nucleus for 50% binding 

When 99% target is bound; 100x1^ = [protein] 
lOOx K<i = [protein] = 2.5 uM 
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(2.5xlO* 6 moles/L) (10" ,2 L/nucleus) (6xl0 23 molecules/mole) 
= about 1 ,500,000 molecules per nucleus for 99% binding of 

target site. 

The appropriate dose of an expression vector encoding a ZFP can also be 
5 calculated by taking into account the average rate of ZFP expression from the 

promoter and the average rate of ZFP degradation in the cell. Tn certain embodiments, 
a weak promoter such as a wild-type or mutant HSV TK promoter is used, as 
described above. The dose of ZFP in micrograms is calculated by taking into account 
the molecular weight of the particular ZFP being employed. 
1 0 In determining the effective amount of the ZFP to be administered in the 

treatment or prophylaxis of disease, the physician evaluates circulating plasma levels 
of the ZFP or nucleic acid encoding the ZFP, potential ZFP toxicities, progression of 
the disease, and the production of anti-ZFP antibodies. Administration can be 
accomplished via single or divided doses. 

15 

Pharmaceutical compositions and administration 

ZFPs and expression vectors encoding ZFPs can be administered directly to 
the patient for targeted cleavage and/or recombination, and for therapeutic or 
prophylactic applications, for example, cancer, ischemia, diabetic retinopathy, 

20 macular degeneration, rheumatoid arthritis, psoriasis, HIV infection, sickle cell 
anemia, Alzheimer's disease, muscular dystrophy, neurodegenerative diseases, 
vascular disease, cystic fibrosis, stroke, and the like. Examples of microorganisms 
that can be inhibited by ZFP gene therapy include pathogenic bacteria, e.g., 
chlamydia, rickettsial bacteria, mycobacteria, staphylococci, streptococci, 

25 pneumococci, meningococci and conococci, klebsiella, proteus, serratia, 

pseudomonas, legionella, diphtheria, salmonella, bacilli, cholera, tetanus, botulism, 
anthrax, plague, leptospirosis, and Lyme disease bacteria; infectious fungus, e.g., 
Aspergillus, Candida species; protozoa such as sporozoa (e.g., Plasmodia), rhizopods 
(e.g., Entamoeba) and flagellates {Trypanosoma, Leishmania, Trichomonas, Giardia, 

30 etc.); viral diseases, e.g., hepatitis (A, B, or C), herpes virus (e.g., VZV, HSV- 1, HSV- 
6, HSV-1I, CMV, and EBV), HIV, Ebola, adenovirus, influenza virus, flavi viruses, 
echovirus, rhinovirus, coxsackie virus, coronavirus, respiratory syncytial virus, 
mumps virus, rotavirus, measles virus, rubella virus, parvovirus, vaccinia virus, 
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HTLV virus, dengue virus, papillomavirus, poliovirus, rabies virus, and arboviral 
encephalitis virus, etc. 

Administration of therapeutically effective amounts is by any of the routes 
normally used for introducing ZFP into ultimate contact with the tissue to be treated. 
5 The ZFPs are administered in any suitable manner, preferably with pharmaceutical ly 
acceptable carriers. Suitable methods of administering such modulators are available 
and well known to those of skill in the art, and, although more than one route can be 
used to administer a particular composition, a particular route can often provide a 
more immediate and more effective reaction than another route. 

1 0 Pharmaceutical^ acceptable carriers are determined in part by the particular 

composition being administered, as well as by the particular method used to 
administer the composition. Accordingly, there is a wide variety of suitable 
formulations of pharmaceutical compositions that are available (see, e.g., Remington's 
Pharmaceutical Sciences, 17 th ed. 1985)). 

15 The ZFPs, alone or in combination with other suitable components, can be 

made into aerosol formulations (i.e., they can be "nebulized") to be administered via 
inhalation. Aerosol formulations can be placed into pressurized acceptable 
propellants, such as dichlorodifluoromethane, propane, nitrogen, and the like. 

Formulations suitable for parenteral administration, such as, for example, by 

20 intravenous, intramuscular, intradermal, and subcutaneous routes, include aqueous 
and non-aqueous, isotonic sterile injection solutions, which can contain antioxidants, 
buffers, bacteriostats, and solutes that render the formulation isotonic with the blood 
of the intended recipient, and aqueous and non-aqueous sterile suspensions that can 
include suspending agents, solubilizers, thickening agents, stabilizers, and 

25 preservatives. The disclosed compositions can be administered, for example, by 

intravenous infusion, orally, topically, intraperitoneally, intravesically or intrathecally. 
The formulations of compounds can be presented in unit-dose or multi-dose sealed 
containers, such as ampules and vials. Injection solutions and suspensions can be 
prepared from sterile powders, granules, and tablets of the kind previously described. 

30 

Applications 

The disclosed methods and compositions for targeted cleavage can be used to 
induce mutations in a genomic sequence, e.g., by cleaving at two sites and deleting 
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sequences in between, by cleavage at a single site followed by non-homologous end 
joining, and/or by cleaving at a site so as to remove one or two or a few nucleotides. 
Targeted cleavage can also be used to create gene knock-outs (e.g., for functional 
genomics or target validation) and to facilitate targeted insertion of a sequence into a 
5 genome (i.e., gene knock-in); e.g., for purposes of cell engineering or protein 
overexpression. Insertion can be by means of replacements of chromosomal 
sequences through homologous recombination or by targeted integration, in which a 
new sequence (i.e., a sequence not present in the region of interest), flanked by 
sequences homologous to the region of interest in the chromosome, is inserted at a 

1 0 predetermined target site. 

The same methods can also be used to replace a wild-type sequence with a 
mutant sequence, or to convert one allele to a different allele. 

Targeted cleavage of infecting or integrated viral genomes can be used to treat 
viral infections in a host. Additionally, targeted cleavage of genes encoding receptors 

1 5 for viruses can be used to block expression of such receptors, thereby preventing viral 
infection and/or viral spread in a host organism. Targeted mutagenesis of genes 
encoding viral receptors (e.g., the CCR5 and CXCR4 receptors for HIV) can be used 
to render the receptors unable to bind to virus, thereby preventing new infection and 
blocking the spread of existing infections. Non-limiting examples of viruses or viral 

20 receptors that may be targeted include herpes simplex virus (HSV), such as HSV-1 
and HSV-2, varicella zoster virus (VZV), Epstein-Barr virus (EBV) and 
cytomegalovirus (CMV), HHV6 and HHV7. The hepatitis family of viruses includes 
hepatitis A virus (HAV), hepatitis B virus (HBV), hepatitis C virus (HCV), the delta 
hepatitis virus (HDV), hepatitis E virus (HEV) and hepatitis G virus (HGV). Other 

25 viruses or their receptors may be targeted, including, but not limited to, Picornaviridae 
(e.g., polioviruses, etc.); Caliciviridae; Togaviridae (e.g., rubella virus, dengue virus, 
etc.); Flaviviridae; Coronaviridae; Reoviridae; Bimaviridae; Rhabodoviridae (e.g., 
rabies virus, etc.); Filoviridae; Paramyxoviridae (e.g., mumps virus, measles virus, 
respiratory syncytial virus, etc.); Orthomyxoviridae (e.g., influenza virus types A, B 

30 and C, etc.); Bunyaviridae; Arenaviridae; Retroviradae; lentiviruses (e.g., HTLV-I; 
HTLV-II; HIV-l (also known as HTLV-III, LAV, ARV, hTLR, etc.) HIV-II); simian 
immunodeficiency virus (SIV), human papillomavirus (HPV), influenza virus and the 
tick-borne encephalitis viruses. See, e.g. Virology, 3rd Edition (W. K. Joklik ed. 
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1988); Fundamental Virology, 2nd Edition (B. N. Fields and D. M. Knipe, eds. 1991), 
for a description of these and other viruses. Receptors for HIV, for example, include 
CCR-5 and CXCR-4. 

In similar fashion, the genome of an infecting bacterium can be mutagenized 
5 by targeted DNA cleavage followed by non-homologous end joining, to block or 
ameliorate bacterial infections. 

The disclosed methods for targeted recombination can be used to replace any 
genomic sequence with a homologous, non-identical sequence. For example, a 
mutant genomic sequence can be replaced by its wild-type counterpart, thereby 

10 providing methods for treatment of e.g., genetic disease, inherited disorders, cancer, 
and autoimmune disease. In like fashion, one allele of a gene can be replaced by a 
different allele using the methods of targeted recombination disclosed herein. 

Exemplary genetic diseases include, but are not limited to, achondroplasia, 
achromatopsia, acid maltase deficiency, adenosine deaminase deficiency (OMIM 

15 No. 1 02700), adrenoleukodystrophy, aicardi syndrome, alpha- 1 antitrypsin deficiency, 
alpha-thalassemia, androgen insensitivity syndrome, apert syndrome, arrhythmogenic 
right ventricular, dysplasia, ataxia telangictasia, barth syndrome, beta-thalassemia, 
blue rubber bleb nevus syndrome, canavan disease, chronic granulomatous diseases 
(CGD), cri du chat syndrome, cystic fibrosis, dercum's disease, ectodermal dysplasia, 

20 fanconi anemia, fibrodysplasia ossificans progressive, fragile X syndrome, 
galactosemis, Gaucher's disease, generalized gangliosidoses (e.g., GM1), 
hemochromatosis, the hemoglobin C mutation in the 6 th codon of beta-globin (HbC), 
hemophilia, Huntington's disease, Hurler Syndrome, hypophosphatasia, Klinefleter 
syndrome, Krabbes Disease, Langer-Giedion Syndrome, leukocyte adhesion 

25 deficiency (LAD, OMIM No. 1 1 6920), leukodystrophy, long QT syndrome, Marfan 
syndrome, Moebius syndrome, mucopolysaccharidosis (MPS), nail patella syndrome, 
nephrogenic diabetes insipdius, neurofibromatosis, Neimann-Pick disease, 
osteogenesis imperfecta, porphyria, Prader-Willi syndrome, progeria, Proteus 
syndrome, retinoblastoma, Rett syndrome, Rubinstein-Taybi syndrome, Sanfilippo 

30 syndrome, severe combined immunodeficiency (SCID), Shwachman syndrome, sickle 
cell disease (sickle cell anemia), Smith-Magenis syndrome, Stickler syndrome, Tay- 
Sachs disease, Thrombocytopenia Absent Radius (TAR) syndrome, Treacher Collins 
syndrome, trisomy, tuberous sclerosis, Turner's syndrome, urea cycle disorder, von 
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Hippel-Landau disease, Waardenburg syndrome, Williams syndrome, Wilson's 
disease, Wiskott-Aldrich syndrome, X-linked lymphoproliferative syndrome (XLP, 
OM1M No. 308240). 

Additional exemplary diseases that can be treated by targeted DNA cleavage 
5 and/or homologous recombination include acquired immunodeficiencies, lysosomal 
storage diseases (e.g., Gaucher's disease, GM1, Fabry disease and Tay-Sachs 
disease), mucopolysaccahidosis (e.g. Hunter's disease, Hurler's disease), 
hemoglobinopathies (e.g., sickle cell diseases, HbC, a-thalassemia, (^thalassemia) 
and hemophilias. 

10 In certain cases, alteration of a genomic sequence in a pluripotent cell (e.g., a 

hematopoietic stem cell) is desired. Methods for mobilization, enrichment and culture 
of hematopoietic stem cells are known in the art. See for example, U.S. Patents 
5,061,620; 5,681,559; 6,335,195; 6,645,489 and 6,667,064. Treated stem cells can 
be returned to a patient for treatment of various diseases including, but not limited to, 

1 5 SCID and sickle-cell anemia. 

In many of these cases, a region of interest comprises a mutation, and the 
donor polynucleotide comprises the corresponding wild-type sequence. Similarly, a 
wild-type genomic sequence can be replaced by a mutant sequence, if such is 
desirable. For example, overexpression of an oncogene can be reversed either by 

20 mutating the gene or by replacing its control sequences with sequences that support a 
lower, non-pathologic level of expression. As another example, the wild-type allele 
of the ApoAI gene can be replaced by the ApoAI Milano allele, to treat 
atherosclerosis. Indeed, any pathology dependent upon a particular genomic 
sequence, in any fashion, can be corrected or alleviated using the methods and 

25 compositions disclosed herein. 

Targeted cleavage and targeted recombination can also be used to alter non- 
coding sequences (e.g., regulatory sequences such as promoters, enhancers, initiators, 
terminators, splice sites) to alter the levels of expression of a gene product. Such 
methods can be used, for example, for therapeutic purposes, functional genomics 

30 and/or target validation studies. 

The compositions and methods described herein also allow for novel 
approaches and systems to address immune reactions of a host to allogeneic grafts. In 
particular, a major problem faced when allogeneic stem cells (or any type of 
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allogeneic cell) are grafted into a host recipient is the high risk of rejection by the 
host's immune system, primarily mediated through recognition of the Major 
Histocompatibility Complex (MHC) on the surface of the engrafted cells. The MHC 
comprises the HLA class I protein(s) that function as heterodimers that are comprised 
5 ofa common p subunit and variable a subunits. It has been demonstrated that tissue 
grafts derived from stem cells that are devoid of HLA escape the host's immune 
response. See, e.g., Coffman et al. J Immunol 151, 425-35. (1993); Markmann et al. 
Transplantation 54, 1085-9. (1992); Roller et al. Science 248, 1227-30. (1990). 
Using the compositions and methods described herein, genes encoding HLA proteins 

1 0 involved in graft rejection can be cleaved, mutagenized or altered by recombination, 
in either their coding or regulatory sequences, so that their expression is blocked or 
they express a non-functional product. For example, by inactivating the gene 
encoding the common p subunit gene (P2 microglobulin) using ZFP fusion proteins as 
described herein, HLA class I can be removed from the cells to rapidly and reliably 

1 5 generate HLA class 1 null stem cells from any donor, thereby reducing the need for 
closely matched donor/recipient MHC haplotypes during stem cell grafting. 

lnactivation of any gene (e.g., the p2 microglobulin gene) can be achieved, for 
example, by a single cleavage event, by cleavage followed by non-homologous end 
joining, by cleavage at two sites followed by joining so as to delete the sequence 

20 between the two cleavage sites, by targeted recombination of a missense or nonsense 
codon into the coding region, or by targeted recombination of an irrelevant sequence 
(i.e., a "stuffer" sequence) into the gene or its regulatory region, so as to disrupt the 
gene or regulatory region. 

Targeted modification of chromatin structure, as disclosed in co-owned 

25 WO 01/83793, can be used to facilitate the binding of fusion proteins to cellular 
chromatin. 

In additional embodiments, one or more fusions between a zinc finger binding 
domain and a recombinase (or functional fragment thereof) can be used, in addition to 
or instead of the zinc finger-cleavage domain fusions disclosed herein, to facilitate 
30 targeted recombination. See, for example, co-owned US patent No. 6,534,261 and 
Akopian et al. (2003) Proc. Natl. Acad. Sci. USA 100:8688-8691. 

In additional embodiments, the disclosed methods and compositions are used 
to provide fusions of ZFP binding domains with transcriptional activation or 



78 



WO 2005/014791 



PCT/US2004/025407 



repression domains that require dimerization (either homodimerization or 
heterodimerization) for their activity. In these cases, a fusion polypeptide comprises a 
zinc finger binding domain and a functional domain monomer (e.g., a monomer from 
a dimeric transcriptional activation or repression domain). Binding of two such 
5 fusion polypeptides to properly situated target sites allows dimerization so as to 
reconstitute a functional transcription activation or repression domain. 

EXAMPLES 

Example 1: Editing of a Chromosomal hSMClLl Gene by Targeted 
10 Recombination 

The hSMClLl gene is the human orthologue of the budding yeast gene 
structural maintenance of chromosomes V. A region of this gene encoding an amino- 
terminal portion of the protein which includes the Walker ATPase domain was 
mutagenized by targeted cleavage and recombination. Cleavage was targeted to the 

1 5 region of the methionine initiation codon (nucleotides 24-26, Figure 1), by designing 
chimeric nucleases, comprising a zinc finger DNA-binding domain and aFoitl 
cleavage half-domain, which bind in the vicinity of the codon. Thus, two zinc finger 
binding domains were designed, one of which recognizes nucleotides 23-34 (primary 
contacts along the top strand as shown in Figure I), and the other of which recognizes 

20 nucleotides 5-16 (primary contacts along the bottom strand). Zinc finger proteins 
were designed as described in co-owned US Patents 6,453,242 and 6,534,261. See 
Table 2 for the amino acid sequences of the recognition regions of the zinc finger 
proteins. 

Sequences encoding each of these two ZFP binding domains were fused to 
25 sequences encoding a Fokl cleavage half-domain (amino acids 384-579 of the native 
Fokl sequence; Kita et al (1989) J. Biol Chem. 264:5751-5756), such that the 
encoded protein contained Fok\ sequences at the carboxy terminus and ZFP 
sequences at the amino terminus. Each of these fusion sequences was then cloned in a 
modified mammalian expression vector pcDNA3 (Figure 2). 

30 
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Table 2: Zinc Finger Designs for the hSMClLl Gene 



Target sequence 


Fl 


F2 


F3 


F4 


CATGGGGTTCCT 
(SEQ ID NO: 27) 


RSHDLIE 
(SEQ ID NO: 28) 


TSSSLSR 
(SEQ ID NO: 29) 


RSDHLST 
(SEQ ID NO: 30) 


TNSNRIT 
(SEQ ID NO: 31) 


GCGGCGCCGGCG 
(SEQ ID NO: 32) 


RSDDLSR 
(SEQ ID NO: 33) 


RSDDRKT 
(SEQ ID NO: 34) 


RSEDLIR 
(SEQ ID NO: 35) 


RSDTLSR 
(SEQ ID NO: 36) 



Note: -The zinc finger amino acid sequences shown above (in one- letter code) represent 
residues -1 through +6, with respect to the start of the alpha-helical portion of each zinc finger. Fingei 
Fl is closest to the amino terminus of the protein, and Finger F4 is closest to the carboxy terminus. 



A donor DNA molecule was obtained as follows. First, a 700 base pair 
fragment of human genomic DNA representing nucleotides 52415936-52416635 of 
the "-" strand of the X chromosome (UCSC human genome release July, 2003), which 
includes the first exon of the human hSMC I LI gene, was amplified, using genomic 

10 DNA from HEK293 cells as template. Sequences of primers used for amplification 
are shown in Table 3 ("Initial amp 1" and "Initial amp 2"). The PCR product was 
then altered, using standard overlap extension PCR methodology (see, e.g., Ho, et ai 
(1989) Gene 77:51-59), resulting in replacement of the sequence ATGGGG 
(nucleotides 24-29 in Figure 1) to ATAAGAAGC. This change resulted in 

1 5 conversion of the ATG codon (methionine) to an ATA codon (isoleucine) and 
replacement of GGG (nucleotides 27-29 in Figure 1) by the sequence AGAAGC, 
allowing discrimination between donor-derived sequences and endogenous 
chromosomal sequences following recombination. A schematic diagram of the 
hSMCl gene, including sequences of the chromosomal DNA in the region of the 

20 initiation codon, and sequences in the donor DNA that differ from the chromosomal 
sequence, is given in Figure 3. The resulting 700 base pair donor fragment was 
cloned into pCR4BluntTopo, which does not contain any sequences homologous to 
the human genome. See Figure 4. 

For targeted mutation of the chromosomal hSMClLl gene, the two plasmids 

25 encoding ZFP-Fo*I fusions and the donor plasmid were introduced into 1 x 1 0 6 
HEK293 cells by transfection using Lipofectamine 2000® (Invitrogen). Controls 
included cells transfected only with the two plasmids encoding the ZFP-FoJfcl fusions, 
cells transfected only with the donor plasmid and cells transfected with a control 
plasmid (pEGFP-Nl, Clontech). Cells were cultured in 5% C0 2 at 37°C. At 48 hours 

30 after transfection, genomic DNA was isolated from the cells, and 200 ng was used as 
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template for PCR amplification, using one primer complementary to a region of the 
gene outside of its region of homology with the donor sequences (nucleotides 
5241 6677-5241 6701 on the STRAND of the X chromosome; UCSC July 2003), 
and a second primer complementary to a region of the donor molecule into which 

5 distinguishing mutations were introduced. Using these two primers, an amplification 
product of 400 base pairs will be obtained from genomic DNA if a targeted 
recombination event has occurred. The sequences of these primers are given in Table 
3 (labeled "chromosome-specific" and "donor-specific," respectively). Conditions for 
amplification were: 94°C, 2 min, followed by 40 cycles of 94°C, 30 sec, 60°C, 1 min, 

1 0 72°C, 1 min; and a final step of 72°C, 7min. 

The results of this analysis (Figure 5) indicate that a 400 base pair 
amplification product (labeled "Chimeric DNA" in the Figure) was obtained only 
with DNA extracted from cells which had been transfected with the donor plasmid 
and both ZFP-Fofcl plasmids. 



15 

Table 3: Amplification Primers for the hSMClLl Gene 



Initial amp 1 


AGCAACAACTCCTCCGGGGATC (SEQ ID NO: 37) 


Initial amp 2 


TTCCAGACGCGACTCTTTGGC (SEQ ID NO: 38) 


Chromosome- 
specific 


CTCAGCAAGCGTGAGCTCAGGTCTC (SEQ ID NO: 39) 


Donor-specific 


CAATCAGTTTCAGGAAGCTTCTT (SEQ ID NO: 40) 


Outside 1 


CTCAGCAAGCGTGAGCTCAGGTCTC (SEQ ID NO: 41) 


Outside 2 


GGGGTCAAGTAAGGCTGGGAAGC (SEQ ID NO: 42) 



To confirm this result, two additional experiments were conducted. First, the 
amplification product was cloned into pCR4Blunt-Topo (Invitrogen) and its 

20 nucleotide sequence was determined. As shown in Figure 6 (SEQ ID NO: 6), the 
amplified sequence obtained from chromosomal DNA of cells transfected with the 
two ZFP-Fo£/-encoding plasmids and the donor plasmid contains the AAGAAGC 
sequence that is unique to the donor (nucleotides 395-401 of the sequence presented 
in Figure 6) covalently linked to chromosomal sequences not present in the donor 

25 molecule (nucleotides 32-97 of Figure 6), indicating that donor sequences have been 
recombined into the chromosome. In particular, the G— >A mutation converting the 
initiation codon to an isoleucine codon is observed at position 395 in the sequence. 
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In a second experiment, chromosomal DNA from cells transfected only with 
donor plasmid, cells transfected with both ZFP-Fofcl fusion plasmids, cells transfected 
with the donor plasmid and both ZTP-Fokl fusion plasmids or cells transfected with 
the EGFP control plasmid was used as template for amplification, using primers 
5 complementary to sequences outside of the 700-nucleotide region of homology 
between donor and chromosomal sequences (identified as "Outside 1" and "Outside 
2" in Table 3). The resulting amplification product was purified and used as template 
for a second amplification reaction using the donor-specific and chromosome-specific 
primers described above (Table 3). This amplification yielded a 400 nucleotide 
10 product only from cells transfected with the donor construct and both ZFP-Fokl 
fusion constructs, a result consistent with the replacement of genomic sequences by 
targeted recombination in these cells. 

Example 2: Editing of a Chromosomal IL2Ry Gene by Targeted 

IS Recombination 

The IL-2Ry gene encodes a protein, known as the "common cytokine receptor 
gamma chain/* that functions as a subunit of several interleukin receptors (including 
IL-2R, IL-4R, 1L-7R, 1L-9R, IL-15R and IL-21R). Mutations in this gene, including 
those surrounding the 5* end of the third exon (e.g. the tyrosine 91 codon), can cause 

20 X-linked severe combined immunodeficiency (SCID). See, for example, Puck et al. 
(1997) Blood 89:1968-1977. A mutation in the tyrosine 91 codon (nucleotides 23-25 
of SEQ ID NO: 7; Figure 7), was introduced into the IL2Ry gene by targeted cleavage 
and recombination. Cleavage was targeted to this region by designing two pairs of 
zinc finger proteins. The first pair (first two rows of Table 4) comprises a zinc finger 

25 protein designed to bind to nucleotides 29-40 (primary contacts along the top strand 
as shown in Figure 7) and a zinc finger protein designed to bind to nucleotides 8-20 
(primary contacts along the bottom strand). The second pair (third and fourth rows of 
Table 4) comprises two zinc finger proteins, the first of which recognizes nucleotides 
23-34 (primary contacts along the top strand as shown in Figure 7) and the second of 

30 which recognizes nucleotides 8-16 (primary contacts along the bottom strand). Zinc 
finger proteins were designed as described in co-owned US Patents 6,453,242 and 
6,534,261. See Table 4 for the amino acid sequences of the recognition regions of the 
zinc finger proteins. 
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Sequences encoding the ZFP binding domains were fused to sequences 
encoding a FokX cleavage half-domain (amino acids 384-579 of the native Fok\ 
sequence, Kita et ai, supra), such that the encoded protein contained Fok\ sequences 
at the carboxy terminus and ZFP sequences at the amino terminus. Each of these 
5 fusion sequences was then cloned in a modified mammalian expression vector 
pcDNA3. See Figure 8 for a schematic diagram of the constructs. 



Table 4: Zinc Finger Designs for the IL2Ry Gene 



Target sequence 


Fl 


F2 


F3 


F4 


A ACTCGG AT AAT 
(SEQ ID NO: 43) 


DRSTL1E 
(SEQ ID NO:44) 


SSSNLSR 
(SEQ ID NO:45) 


RSDDLSK 
(SEQ IDNO:46) 


DNSNRIK 
(SEQIDNO:47) 


TAGAGGaGAAAGG 
(SEQ ID NO:48) 


RSDNLSN 
(SEQ IDNO:49) 


TSSSRIN 
(SEQIDNO:50) 


RSDHLSQ 
(SEQ IDNO;51) 


RNADRKT 
(SEQ IDNO:52) 


TACAAGAACTCG 
(SEQ ID NO:53) 


RSDDLSK 
(SEQ ID NO:54) 


DNSNRIK 
(SEQ ID NO:55) 


RSDALSV 
(SEQ ID NO:56) 


DNANRTK 
(SEQ IDNO:57) 


GGAGAAAGG 
(SEQ IDNO:58) 


RSDHLTQ 
(SEQ ID NO:59) 


QSGNLAR 
(SEQ IDNO:60) 


RSDHLSR 
(SEQIDNO:61) 





Note: The zinc finger amino acid sequences shown above (in one-letter code) represent 
1 0 residues - 1 through +6, with respect to the start of the alpha-helical portion of each zinc finger. Finger 
Fl is closest to the amino terminus of the protein. 



A donor DNA molecule was obtained as follows. First, a 700 base pair 
fragment of human DNA corresponding to positions 69196910-69197609 on the 

1 5 strand of the X chromosome (UCSC, July 2003), which includes exon 3 of the of the 
IL2Ry gene, was amplified, using genomic DNA from K562 cells as template. See 
Figure 9. Sequences of primers used for amplification are shown in Table 5 (labeled 
initial amp 1 and initial amp 2). The PCR product was then altered via standard 
overlap extension PCR methodology (Ho, et ai, supra) to replace the sequence 

20 TACAAGAACTCGGATAAT (SEQ ID NO: 62) with the sequence 

TAAAAGA ATTCCGACAAC (SEQ ID NO: 63). This replacement results in the 
introduction of a point mutation at nucleotide 25 (Figure 7), converting the tyrosine 
91 codon TAC to a TAA termination codon and enables discrimination between 
donor-derived and endogenous chromosomal sequences following recombination, 

25 because of differences in the sequences downstream of codon 91 . The resulting 700 
base pair fragment was cloned into pCR4BluntTopo which does not contain any 
sequences homologous to the human genome. See Figure 10. 
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For targeted mutation of the chromosomal !L2Ry gene, the donor plasmid, 
along with two plasmids each encoding one of a pair of ZFV-Fokl fusions, were 
introduced into 2xl0 6 K652 cells using mixed lipofection/electroporation (Amaxa). 
Each of the ZFP/Fokl pairs (see Table 4) was tested in separate experiments. 

5 Controls included cells transfected only with two plasmids encoding ZFP-Fokl 

fusions, and cells transfected only with the donor plasmid. Cells were cultured in 5% 
CO2 at 37°C. At 48 hours after transfection, genomic DNA was isolated from the 
cells, and 200 ng was used as template for PCR amplification, using one primer 
complementary to a region of the gene outside of its region of homology with the 

1 0 donor sequences (nucleotides 691 96839-69 1 96863 on the strand of the X 

chromosome; UCSC, July 2003), and a second primer complementary to a region of 
the donor molecule into which distinguishing mutations were introduced (see above) 
and whose sequence therefore diverges from that of chromosomal DNA. See Table 5 
for primer sequences, labeled "chromosome-specific" and "donor-specific," 

1 5 respectively. Using these two primers, an amplification product of 500 bp is obtained 
from genomic DNA in which a targeted recombination event has occurred. 
Conditions for amplification were: 94°C, 2 min, followed by 35 cycles of 94°C, 30 
sec, 62°C, 1 min, 72°C, 45 sec; and a final step of 72°C, 7min. 

The results of this analysis (Figure 1 1 ) indicate that an amplification product 

20 of the expected size (500 base pairs) is obtained with DNA extracted from cells which 
had been transfected with the donor plasmid and either of the pairs of ZFP-FoAI- 
encoding plasmids. DNA from cells transfected with plasmids encoding a pair of 
ZFPs only (no donor plasmid) did not result in generation of the 500 bp product, nor 
did DNA from cells transfected only with the donor plasmid.. 



25 

Table 5: Amplification Primers for the IL2Ry Gene 



Initial amp 1 


TGTCGAGTACATGAATTGCACTTGG (SEQ ID NO:64) 


Initial amp 2 


TTAGGTTCTCTGGAGCCCAGGG (SEQ ID NO:65) 


Chromosome- 
specific 


CTCCAAACAGTGGTTCAAGAATCTG (SEQ IDNO.66) 


Donor-specific 


TCCTCTAGGTAAAGAATTCCGACAAC (SEQ ID 
NO:67) 
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To confirm this result, the amplification product obtained from the experiment 
using the second pair of ZFP/Fokl fusions was cloned into pCR4Blunt-Topo 
(Invitrogen) and its nucleotide sequence was determined. As shown in Figure 12 
(SEQ ID NO: 12), the sequence consists of a fusion between chromosomal sequences 
5 and sequences from the donor plasmid. In particular, the G to A mutation converting 
tyrosine 91 to a stop codon is observed at position 43 in the sequence. Positions 43- 
58 contain nucleotides unique to the donor; nucleotides 32-42 and 59-459 are 
sequences common to the donor and the chromosome, and nucleotides 460-552 are 
unique to the chromosome. The presence of donor-unique sequences covalentiy 
1 0 linked to sequences present in the chromosome but not in the donor indicates that 
DNA from the donor plasmid was introduced into the chromosome by homologous 
recombination. 

Example 3: Editing of a Chromosomal P-globin Gene by Targeted 

15 Recombination 

The human beta globin gene is one of two gene products responsible for the 
structure and function of hemoglobin in adult human erythrocytes. Mutations in the 
beta-globin gene can result in sickle cell anemia. Two zinc finger proteins were 
designed to bind within this sequence, near the location of a nucleotide which, when 

20 mutated, causes sickle cell anemia. Figure 1 3 shows the nucleotide sequence of a 
portion of the human beta-globin gene, and the target sites for the two zinc finger 
proteins are underlined in the sequence presented in Figure 13. Amino acid sequences 
of the recognition regions of the two zinc finger proteins are shown in Table 6. 
Sequences encoding each of these two ZFP binding domains were fused to sequences 

25 encoding a Fok\ cleavage half-domain, as described above, to create engineered ZFP- 
nucleases that targeted the endogenous beta globin gene. Each of these fusion 
sequences was then cloned in the mammalian expression vector pcDNA3.1 (Figure 
H). 
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Table 6: Zinc Finger Designs for the beta-globin Gene 



Target sequence 


Fl 


F2 


F3 


F4 


GGGCAGTAACGG 
(SEQ ID NO: 68 


RSDHLSE 
(SEQ ID NO: 69) 


QSANRTK 
(SEQ ID NO: 70) 


RSDNLSA 
(SEQ ID NO: 71) 


RSQNRTR 
(SEQ ID NO: 72) 


AAGGTGAACGTG 
(SEQ ID NO: 73) 


RSDSLSR 
(SEQ ID NO: 74) 


DSSNRKT 
(SEQ ID NO: 75) 


RSDSLSA 
(SEQ ID NO: 76) 


RNDNRKT 
(SEQ ID NO: 77) 



Note: The zinc finger amino acid sequences shown above (in one- letter code) represent 
residues -1 through +6, with respect to the start of the alpha-helical portion of each zinc finger. Finger 
Fl is closest to the amino terminus of the protein, and Finger F4 is closest to the carboxy terminus. 



A donor DNA molecule was obtained as follows. First, a 700 base pair 
fragment of human genomic DNA corresponding to nucleotides 5212134 - 5212833 
on the "-" strand of Chromosome 1 1 (BLAT, UCSC Human Genome site) was 
amplified by PCR, using genomic DNA from K562 cells as template. Sequences of 

10 primers used for amplification are shown in Table 7 (labeled initial amp 1 and initial 
amp 2). The resulting amplified fragment contains sequences corresponding to the 
promoter, the first two exons and the first intron of the human beta globin gene. See 
Figure 15 for a schematic illustrating the locations of exons 1 and 2, the first intron, 
and the primer binding sites in the beta globin sequence. The cloned product was then 

1 5 further modified by PCR to introduce a set of sequence changes between nucleotides 
305-336 (as shown in Figure 13), which replaced the sequence 
CCGTTACTGCCCTGTGGGGCAAGGTGAACGTG (SEQ ID NO: 78) with 
gCGTTAgTGCCCG AATTCCG AtcGTcAACcac (SEQ ID NO: 79) (changes in 
bold). Certain of these changes (shown in lowercase) were specifically engineered to 

20 prevent the ZF?IFok\ fusion proteins from binding to and cleaving the donor 
sequence, once integrated into the chromosome. Jn addition, all of the sequence 
changes enable discrimination between donor and endogenous chromosomal 
sequences following recombination. The resulting 700 base pair fragment was cloned 
into pCR4-TOPO, which does not contain any sequences homologous to the human 

25 genome (Figure 1 6). 

For targeted mutation of the chromosomal beta globin gene, the two plasmids 
encoding ZFP-Fok\ fusions and the donor plasmid (pCR4-TOPO-HBBdonor) were 
introduced into 1 X 10 6 K562 cells by transfection using Nucleofector™ Solution 
(Amaxa Biosystems). Controls included cells transfected only with 100 ng (low) or 

30 200 ng (high) of the two plasmids encoding the ZF?-Fokl fusions, cells transfected 
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only with 200 ng (low) or 600 ng (high) of the donor plasmid, cells transfected with a 
GFP-encoding plasmid, and mock transfected cells. Cells were cultured in RPMI 
Medium 1640 (Invitrogen), supplemented with 10% fetal bovine serum (FBS) 
(Hycione) and 2 mM L-glutamine. Cells were maintained at 37°C in an atmosphere 
5 of 5% C0 2 . At 72 hours after transfection, genomic DNA was isolated from the cells, 
and 200 ng was used as template for PCR amplification, using one primer 
complementary to a region of the gene outside of its region of homology with the 
donor sequences (nucleotides 5212883-5212905 on the "-" strand of chromosome 1 1), 
and a second primer complementary to a region of the donor molecule into which 

10 distinguishing mutations were introduced into the donor sequence (see supra). The 
sequences of these primers are given in Table 7 (labeled "chromosome-specific" and 
"donor-specific," respectively). Using these two primers, an amplification product of 
415 base pairs will be obtained from genomic DNA if a targeted recombination event 
has occurred. As a control for DNA loading, PCR reactions were also carried out 

1 5 using the Initial amp 1 and Initial amp 2 primers to ensure that similar levels of 

genomic DNA were added to each PCR reaction. Conditions for amplification were: 
95°C, 2 min, followed by 40 cycles of 95°C, 30 sec, 60°C, 45 sec, 68°C, 2 min; and a 
final step of 68°C, 10 min. 

The results of this analysis (Figure 17) indicate that a 415 base pair 

20 ' amplification product was obtained only with DNA extracted from cells which had 
been transfected with the "high" concentration of donor plasmid and both ZFP-Fokl 
plasm ids, consistent with targeted recombination of donor sequences into the 
chromosomal beta-globin locus. 

25 Table 7: Amplification Primers for the human beta globin gene 



Initial amp 1 


TACTGATGGTATGGGGCCAAGAG (SEQ ID NO:80) 


Initial amp 2 


CACGTGCAGCTTGTCACAGTGC (SEQ IDNO:81) 


Chromosome-specific 


TGCTTACCAAGCTGTGATTCCA (SEQ ID NO:82) 


Donor-specific 


GGTTGACGATCGGAATTC (SEQ ID NO:83) 



To confirm this result, the amplification product was cloned into pCR4-TOPO 
(Invitrogen) and its nucleotide sequence was determined. As shown in Figure 18 
(SEQ ID NO: 14), the sequence consists of a flision between chromosomal sequences 
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10 



not present on the donor plasmid and sequences unique to the donor plasm id. For 
example, two C— >G mutations which disrupt ZFP-binding are observed at positions 
377 and 383 in the sequence. Nucleotides 377-408 represent sequence obtained from 
the donor plasmid containing the sequence changes described above; nucleotides 73- 
376 are sequences common to the donor and the chromosome, and nucleotides 1-72 
are unique to the chromosome. The covalent linkage of donor-specific and 
chromosome-specific sequences in the genome confirms the successful recombination 
of the donor sequence at the correct locus within the genome of K562 cells. 



Example 4: ZFP-Fokl linker (ZC linker) optimization 

In order to test the effect of ZC linker length on cleavage efficiency, a four- 
finger ZFP binding domain was fused to a Fok\ cleavage half-domain, using ZC 
linkers of various lengths. The target site for the ZFP is 5 A ACTCGG ATAAT-3 ' 
(SEQ ID NO:84) and the amino acid sequences of the recognition regions (positions - 
1 5 1 through +6 with respect to the start of the alpha-helix) of each of the zinc fingers 
were as follows (wherein Fl is the N-most, and F4 is the C-most zinc finger): 

Fl : DRSTLIE (SEQ ID NO:85) 

F2: SSSNLSR (SEQ ID NO:86) 

F3: RSDDLSK (SEQ ID NO:87) 
20 F4: DNSNRIK (SEQ ID NO:88) 

ZFP-FoAJ fusions, in which the aforementioned ZFP binding domain and a 
Fok\ cleavage half-domain were separated by 2, 3, 4, 5, 6, or 10 amino acid residues, 
were constructed. Each of these proteins was tested for cleavage of substrates having 
an inverted repeat of the ZFP target site, with repeats separated by 4, 5, 6, 7, 8, 9, 12, 
25 15, 16, 17, 22, or 26 basepairs. 

The amino acid sequences of the fusion constructs, in the region of the ZFP- 
Fokl junction (with the ZC linker sequence underlined), are as follows: 

10-residue linker HTKIH LROKDAARGS OLV (SEQ ID NO;89) 

6-residue linker HTKIH LROKGS OLV (SEQ ID NO:90) 

30 5-residue linker HTKIHLRQGSQLV (SEQ ID NO:91) 

4-residue linker HTKIHLRGSQLV (SEQ IDNO:92) 

3-residue linker HTKIHLGSQLV (SEQIDNO:93) 
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2-residue linker HTKIHGSQLV (SEQ ID NO:94) 



The sequences of the various cleavage substrates, with the ZFP target sites 
underlined, are as follows: 



10 



15 



20 



30 



35 



40 



45 



50 



4bp separation 



5bp separation 



6bp separation 



7bp separation 



8bp separation 



25 9bp separation 



12bp separation 



15bp separation 



16bp separation 



17bp separation 



CTAGCATTATCCGAGTTACACAACTCGGATAATGCTAG 
GATCGTAATAGGCTCAATGTGTTGAGCCTATTACGATC 
(SEQ ID NO: 95) 

CTAGCATTATCCGAGTTCACAC AACTCGGATAAT GCTAG 
G ATCG T AATAGGC TCAA GTG T GT TGAGCCT ATT ACGAT C 
(SEQ ID NO: 96) 

CTAGGCATTATCCGAGTTCACCAC AACTCGGATAAT GACTAG 
GATCCG TAATAGGCTCAA GTGGTGTTGAGCCTATTACTGATC 
(SEQ ID NO: 97) 

CTAGCATTATCCGAGTTCACACAC AACTCGGATAAT GCTAG 
GATCGTAATAGGCTCAAGTGTGTGTTGAGCCTATTACGATC 
(SEQ ID NO: 98) 

CTAGCATTATCCGAGTTCACCACAC AACTCGGATAAT GCTAG 
GATCG TAATAGGCTCAA GTGGTGTGTTGAGCCTATTACGATC 
(SEQ ID NO: 99) 

CTAGCATTATCCGAGTTCACACACA CAACTCGGATAAT GCTAG 
GATCGTAATAGGCTCAAGTGTGTGTGT TGAGCCT ATT ACG AT C 
(SEQ ID NO: 100) 

CTAGCATTATCCGAGTTCACCACCAACAC AACTCGGATAAT GCTAG 
GATCGTAATAGGCTCAAGTGGTGGTTGTGT TGAGCCT ATT ACG ATC 
(SEQ ID NO: 101) 

CTAGCATTATCCGAGTTCACCACCAACCACAC AACTCGGATAAT GCTAG 

GATCGTAATAGGCTCAAGTGGTGGTTGGTGTGT TGAGCCT ATT ACG ATC 
(SEQ ID NO: 102) 

CTAGCATTATCCGAGTTCACCACCAACCACACC AACTCGGATAAT GCTAG 
GATCGTAATAGGCTCAAGTGGTGGTTGGTGTGGTTGAGCCTATTACGATC 
(SEQ ID NO:103) 

CTAGCATTATCCGAGTTCAACCACCAACCACACC AACTCGGATAAT GCTAG 

GATCGTAATAGGCTCAAGTTGGTGGTTGGTGTGGTTGAGCCTATTACGATC 
(SEQ ID NO: 104) 



22bp separation 

CTAGCATTATCCGAGTTCAACCACCAACCACACCAACA CAACTCGGATAAT GCTAG 
GATCGTAATAGGCTCAAGTTGGTGGTTGGTGTGGTTGTGTTGAGCCTATTACGATC 
(SEQ ID NO: 105) 

26bp separation 

CTAGCATTATCCGAGTTCAACCACCAACCACACCAACACCACC AACTCGGATAAT GCTAG 

GATCGTAATAGGCTCAAGTTGGTGGTTGGTGTGGTTGTGGTGGTTGAGCCTATTACGATC 
(SEQ ID NO: 106) 



55 
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Plasmids encoding the different ZFP-FotI fusion proteins (see above) were 
constructed by standard molecular biological techniques, and an in vitro coupled 
transcription/translation system was used to express the encoded proteins. For each 
construct, 200 ng linearized plasmid DNA was incubated in 20 TnT mix and 
5 incubated at 30° C for 1 hour and 45 minutes. TnT mix contains 100 \A TnT lysate 
(Promega, Madison, WI) with 4 fil T7 RNA polymerase (Promega) + 2 ^1 Methionine 
(1 mM) + 2.5 pi ZnCl 2 (20 mM). 

For analysis of DNA cleavage by the different ZV?-Fok\ fusions, 1 ul of the 
coupled transcription/translation reaction mixture was combined with approximately 1 

10 ng DNA substrate (end-labeled with 32 P using T4 polynucleotide kinase), and the 
mixture was diluted to a final volume of 19 yA with Fok\ Cleavage Buffer. Fok\ 
Cleavage buffer contains 20 mM Tris-HCI pH 8.5, 75 mM NaCl, 10 \xU ZnCl 2 , 1 mM 
DTT, 5% glycerol, 500 ng/ml BSA. The mixture was incubated for 1 hour at 37° C. 
6.5 |*l of Fokl buffer, also containing 8 mM MgCl 2 , was then added and incubation 

1 5 was continued for one hour at 37° C. Protein was extracted by adding 10 ul phenol- 
chloroform solution to each reaction, mixing, and centrifuging to separate the phases. 
Ten microliters of the aqueous phase from each reaction was analyzed by 
electrophoresis on a 10% polyacrylamide gel. 

The gel was subjected to autoradiography, and the cleavage efficiency for each 

20 ZFP-Fo£I fusion/substrate pair was calculated by quantifying the radioactivity in 
bands corresponding to uncleaved and cleaved substrate, summing to obtain total 
radioactivity, and determining the percentage of the total radioactivity present in the 
bands representing cleavage products. 

The results of this experiment are shown in Table 8. This data allows the 

25 selection of a ZC linker that provides optimum cleavage efficiency for a given target 
site separation. This data also allows the selection of linker lengths that allow 
cleavage at a selected pair of target sites, but discriminate against cleavage at the 
same or similar ZFP target sites that have a separation that is different from that at the 
intended cleavage site. 

30 
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Table 8: DNA cleavage efficiency for various ZC linker lengths and various binding site 
separations* 

10- 





2- residue 


3-residue 


4-residue 


5- residue 


6-residue 


residue 


4 bp 


74% 


81% 


74% 


12% 


6% 


4% 


5 bp 


61% 


89% 


92% 


80% 


53% 


40% 


6 bp 


78% 


89% 


95% 


91% 


93% 


76% 


7 bp 


15% 


55% 


80% 


80% 


70% 


80% 


8 bp 


0% 


0% 


8% 


11% 


22% 


63% 


9 bp 


2% 


6% 


23% 


9% 


13% 


51% 


12 bp 


8% 


12% 


22% 


40% 


69% 


84% 


15 bp 


73% 


78% 


97% 


92% 


95% 


88% 


16 bp 


59% 


89% 


100% 


97% 


90% 


86% 


17 bp 


5% 


22% 


77% 


71% 


85% 


82% 


22 bp 


1% 


3% 


5% 


8% 


18% 


58% 


26 bp 


1% 


2% 


35% 


36% 


84% 


78% 



* The columns represent different ZFP-Fok\ fusion constructs with the indicated number of residues 
separating the ZFP and the Fok\ cleavage half-domain. The rows represent different DNA substrates 
5 with the indicated number of basepairs separating the inverted repeals of the ZFP target site. 

For ZFP-Fokl fusions with four residue linkers, the amino acid sequence of 
the linker was also varied. In separate constructs, the original LRGS linker sequence 
(SEQ ID NO: 107) was changed to LGGS (SEQ ID NO: 1 08), TGGS (SEQ ID 

10 NO: 109), GGGS (SEQ ID NO:l 10), LPGS (SEQ ID NO:il 1), LRKS (SEQ ID 
NO: 1 1 2), and LR WS (SEQ ID NO: 1 1 3); and the resulting fusions were tested on 
substrates having a six-basepair separation between binding sites. Fusions containing 
the LGGS (SEQ ID NO: 108) linker sequence were observed to cleave more 
efficiently than those containing the original LRGS sequence(SEQ ID NO: 107). 

1 5 Fusions containing the LRKS(SEQ ID NO: 1 1 2) and LRWS(SEQ ID NO: 1 1 3) 

sequences cleaved with less efficiency than the LRGS sequence(SEQ ID NO: 107), 
while the cleavage efficiencies of the remaining fusions were similar to that of the 
fusion comprising the original LRGS sequence(SEQ ID NO: 107). 

20 Example 5: Increased cleavage specificity resulting from alteration of the 

Fok\ cleavage half-domain in the dimerization interface 

A pair of ZPWFokl fusion proteins (denoted 5-8 and 5-10) were designed to 
bind to target sites in the fifth exon of the IL-2Ry gene, to promote cleavage in the 
region between the target sites. The relevant region of the gene, including the target 
25 sequences of the two fusion proteins, is shown in Figure 1 9. The amino acid 

sequence of the 5-8 protein is shown in Figure 20, and the amino acid sequence of the 
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5-10 protein is shown in Figure 21. Both proteins contain a 10 amino acid ZC linker. 
With respect to the zinc finger portion of these proteins, the DNA target sequences, as 
well as amino acid sequences of the recognition regions in the zinc fingers, are given 
in Table 9. 



Table 9: Zinc Finger Designs for the IL2Ry Gene 



Fusion 


Target sequence 


Fl 


F2 


F3 


F4 


5-8 


ACTCTGTGGAAG 
(SEQ ID NO: 11 4) 


RSDNLSE 
(SEQ ID 
NO: 11 5) 


RNAHRIN 

(SEQ ID 
NO: 11 6) 


RSDTLSE 
(SEQ ID 
NO: 11 7) 


ARSTRTT 
(SEQ ID 
NO: 11 8) 


5-10 


AACACGaAACGTG 
(SEQ ID NO: 11 9) 


RSDSLSR 
(SEQ ID 
NO: 120) 


DSSNRKT 

(SEQ ID 
NO:l21) 


RSDSLSV 

(SEQ ID 
NO: 122) 


DRSNRIT 
(SEQ ID 
NO: 123) 



10 



15 



20 



25 



residues -1 through +6, with respect to the start of the alpha-helical portion of each zinc finger. Finger 
F I is closest to the amino terminus of the protein. 

The ability of this pair of fusion proteins to catalyze specific cleavage of DNA 
between their target sequences (see Figure 19) was tested in vitro using a labeled 
DNA template containing the target sequence and assaying for the presence of 
diagnostic digestion products. Specific cleavage was obtained when both proteins 
were used (Table 10, first row). However, the 5-10 fusion protein (comprising a wild- 
type Fokl cleavage half-domain) was also capable of aberrant cleavage at a non-target 
site in the absence of the 5-8 protein (Table 10, second row), possibly due to self- 
dimcrization. 

Accordingly, 5-10 was modified in its Fokl cleavage half-domain by 
converting amino acid residue 490 from glutamic acid (E) to lysine (K). (Numbering 
of amino acid residues in the Fokl protein is according to Wah et ah, supra,) This 
modification was designed to prevent homodimerization by altering an amino acid 
residue in the dimerization interface. The 5-10 (E490K) mutant, unlike the parental 
5-10 protein, was unable to cleave at aberrant sites in the absence of the 5-8 fusion 
protein (Table 10, Row 3). However, the 5-10 (E490K) mutant, together with the 5-8 
protein, catalyzed specific cleavage of the substrate (Table 10, Row 4). Thus, 
alteration of a residue in the cleavage half-domain of 5-10, that is involved in 
dimerization, prevented aberrant cleavage by this fusion protein due to self- 
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dimerization. An E490R mutant also exhibits lower levels of homodimerization than 
the parent protein. 

In addition, the 5-8 protein was modified in its dimerization interface by 
replacing the glutamine (Q) residue at position 486 with glutamic acid (E). This 5-8 
5 (Q486E) mutant was tested for its ability to catalyze targeted cleavage in the presence 
of either the wild-type 5-10 protein or the 5-10 (E490K) mutant. DNA cleavage was 
not observed when the labeled substrate was incubated in the presence of both 5-8 
(Q486E) and wild-type 5-10 (Table 10, Row 5). However, cleavage was obtained 
when the 5-8 (Q486E) and 5-10 (E490K) mutants were used in combination (Table 
10 10, Row 6). 

These results indicate that DNA cleavage by a ZFP/Fokl fusion protein pair, at 
regions other than that defined by the target sequences of the two fusion proteins, can 
be minimized or abolished by altering the amino acid sequence of the cleavage half- 
domain in one or both of the fusion proteins. 

15 



Table 10: DNA cleavage by ZFF/Fokl fusion protein pairs containing wild- 
type 

and mutant cleavage half-domains 





ZFP 5-8 binding domain 


ZFP 5-10 binding 
domain 


DNA cleavage 


1 


Wild-type Fokl 


Wild-type Foil j 


Specific 


2 


Not present 


Wild-type Fokl 


Non-specific 


3 


Not present 


Fokl E490K 


None 


4 


Wild-type Foil 


Fokl E490K 


Specific 


5 


Fokl Q486E 


Wild-type Fokl 


None 


6 


Fokl Q486E 


Fokl E490K 


Specific 



Note: Each row of the table presents results of a separate experiment in which ZFP/FoJtl 
fusion proteins were tested for cleavage of a labeled DNA substrate. One of the fusion 
proteins contained the 5-8 DNA binding domain, and the other fusion protein contained the 5- 
10 DNA binding domain (See Table 9 and Figure 19). The cleavage half-domain portion of 

25 the fusion proteins was as indicated in the Table. Thus, the entries in the ZFP 5-8 column 
indicate the type of Fokl cleavage domain fused to ZFP 5-8; and the entries in the ZFP 5-10 
column indicates the type of Fokl cleavage domain fused to ZFP 5-10. For the Fokl cleavage 
half-domain mutants, the number refers to the amino acid residue in the Fokl protein; the 
letter preceding the number refers to the amino acid present in the wild-type protein and the 

30 letter following the number denotes the amino acid to which the wild-type residue was 
changed in generating the modified protein. 

'Not present 1 indicates that the entire ZFP/FoJH fusion protein was omitted from that 
particular experiment. 
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The DNA substrate used in this experiment was an approximately 400 bp PCR product 
containing the target sites for both ZFP 5-8 and ZFP 5-10. See Figure 19 for the sequences 
and relative orientation of the two target sites. 

5 Example 6: Generation of a defective enhanced Green Fluorescent 

Protein (eGFP) gene 

The enhanced Green Fluorescent Protein (eGFP) is a modified form of the 
Green Fluorescent Protein (GFP; see, e.g., Tsien (1998) Ann. Rev. Biochem. 67:509- 
544) containing changes at amino acid 64 (phe to leu) and 65 (ser to thr). Heim el ah 

10 (1995) Nature 373:663-664; Cormack et al (1996) Gene 173:33-38. An eGFP-based 
reporter system was constructed by generating a defective form of the eGFP gene, 
which contained a stop codon and a 2-bp frameshift mutation. The sequence of the 
eGFP gene is shown in Figure 22. The mutations were inserted by overlapping PCR 
mutagenesis, using the Platinum® Taq DNA Polymerase High Fidelity kit (Invitrogen) 

1 5 and the oligonucleotides GFP-Bam, GFP-Xba, stop sense2 s and stop anti2 as primers 
(oligonucleotide sequences are listed below in Table 1 1). GFP-Bam and GFP-Xba 
served as the external primers, while the primers stop sense2 and stop anti2 served as 
the internal primers encoding the nucleotide changes. The peGFP-Nl vector (BD 
Biosciences), encoding a full-length eGFP gene, was used as the DNA template in 

20 two separate amplification reactions, the first utilizing the GFP-Bam and stop anti2 
oligonucleotides as primers and the second using the GFP-Xba and stop sense2 
oligonucleotides as primers. This generated two amplification products whose 
sequences overlapped. These products were combined and used as template in a third 
amplification reaction, using the external GFP-Bam and GFP-Xba oligonucleotides as 

25 primers, to regenerate a modified eGFP gene in which the sequence GACCACAT 
(SEQ ID NO: 124) at nucleotides 280-287 was replaced with the sequence TAACAC 
(SEQ ID NO: 125). The PCR conditions for all amplification reactions were as 
follows: the template was initially denatured for 2 minutes at 94 degrees and followed 
by 25 cycles of amplification by incubating the reaction for 30 sec. at 94 degrees C, 

30 45 sec. at 46 degrees C, and 60 sec. at 68 degrees C. A final round of extension was 
carried out at 68 degrees C for 10 minutes. The sequence of the final amplification 
product is shown in Figure 23. This 795 bp fragment was cloned into the pCR(R)4- 
TOPO vector using the TOPO-TA cloning kit (Invitrogen) to generate the pCR(R)4- 
TOPO-GFPmut construct. 
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Table 11: Oligonucleotide sequences for GFP 

OH go sequence S'-3* 

GFP-Bam CG AATTCTGC AGTCG AC (SEQ ID NO: 1 26) 
5 GFP-Xba GATTATGATCTAG AGTCG (SEQ ID NO: 1 27) 

stop sense2 AGCCGCTACCCCTAACACGAAGC AG (SEQ ID NO: 1 28) 
stop anti2 CTGCTTCGTGTTAGGGGTAGCGGCT (SEQ ID NO: 1 29) 

Example 7: Design and assembly of Zinc Finger Nucleases targeting eGFP 

1 0 Two three- finger ZFPs were designed to bind a region of the mutated GFP 

gene (Example 6) corresponding to nucleotides 271-294 (numbering according to 
Figure 23). The binding sites for these proteins occur in opposite orientation with 6 
base pairs separating the two binding sites. See Figure 23. ZFP 287A binds 
nucleotides 271-279 on the non-coding strand, while ZFP 296 binds nucleotides 286- 

1 5 294 on the coding strand. The DN A target and amino acid sequence for the 
recognition regions of the ZFPs are listed below, and in Table 12: 
287A: 

Fl (GCGg) RSDDLTR (SEQ ID NO: 130) 
F2 (GTA) QSGALAR (SEQ ID NO:131) 
20 F3 (GGG) RSDHLSR (SEQ ID NO: 132) 

296S: 

Fl (GCA) QSGSLTR (SEQ IDNO:133) 

F2 (GCA) QSGDLTR (SEQ ID NO: 1 34) 

25 F3 (GAA) QSGNLAR (SEQ ID NO: 135) 



Table 12: Zinc finger designs for the GFP gene 



Protein 


Target sequence 


Fl 


F2 


F3 


287A 


GGGGTAGCGg 
(SEQ ID NO: 136) 


RSDDLTR 
(SEQ ID NO: 137) 


QSGALAR 
(SEQ ID NO: 138) 


RSDHLSR 
(SEQ ID NO: 139) 


296S 


GAAGCAGCA 
(SEQ ID NO: 140) 


QSGSLTR 
(SEQIDNO:141) 


QSGDLTR 
(SEQ ID NO: 142) 


QSGNLAR 
(SEQ ID NO: 143) 



Note: The zinc finger amino acid sequences shown above (in one-letter code) represent 



residues -1 through +6, with respect to the start of the alpha-helical portion of each zinc finger. Finger 
30 Fl is closest to the amino terminus of the protein, and Finger F3 is closest to the carboxy terminus. 



Sequences encoding these proteins were generated by PCR assembly (e.g., 
U.S. Patent No. 6,534,261), cloned between the Kpn\ and BarnUl sites of the 
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pcDNA3.1 vector (Invitrogen), and fused in frame with the catalytic domain of the 
Fokl endonuclease (amino acids 384-579 of the sequence of Looney et ah (1989) 
Gene 80:193-208). The resulting constructs were named pcDNA3.1-GFP287-FokI 
and pcDNA3.1-GFP296-FokI (Figure 24). 

5 

Example 8: Targeted in vitro DNA cleavage by designed Zinc Finger 
Nucleases 

The pCR(R)4-TOPO-GFPmut construct (Example 6) was used to provide a 
template for testing the ability of the 287 and 296 zinc finger proteins to specifically 
1 0 recognize their target sites and cleave this modified form of eGFP in vitro. 

A DNA fragment containing the defective eGFP-encoding insert was obtained 
by PCR amplification, using the T7 and T3 universal primers and pCR(R)4-TOPO- 
GFPmut as template. This fragment was end-labeled using y- 32 P-ATP and T4 
polynucleotide kinase. Unincorporated nucleotide was removed using a microspin G- 
1 5 50 column (Amersham). 

An in vitro coupled transcription/translation system was used to express the 
287 and 296 zinc finger nucleases described in Example 7. For each construct, 200 
ng linearized plasmid DNA was incubated in 20 nL TnT mix and incubated at 30° C 
for 1 hour and 45 minutes. TnT mix contains 100 pal TnT lysate (which includes T7 
20 RNA polymerase, Promega, Madison, WI) supplemented with 2 \x\ Methionine (1 
mM) and 2.5 ^il ZnCI 2 (20 mM). 

For analysis of DNA cleavage, aliquots from each of the 287 and 296 coupled 
transcription/translation reaction mixtures were combined, then serially diluted with 
cleavage buffer. Cleavage buffer contains 20 mM Tris-HCl pH 8.5, 75 mM NaCI, 
25 10 mM MgCI 2 , 10 \xM ZnCI 2 , 1 mM DTT, 5% glycerol, 500 ng/ml BSA. 5^1 of each 
dilution was combined with approximately 1 ng DNA substrate (end-labeled with 32 P 
using T4 polynucleotide kinase as described above), and each mixture was further 
diluted to generate a 20 \il cleavage reaction having the following composition: 20 
mM Tris-HCl pH 8.5, 75 mM NaCI, 10 mM MgCl 2 , 10 nM ZnCI 2 , 1 mM DTT, 5% 
30 glycerol, 500 jig/ml BSA. Cleavage reactions were incubated for 1 hour at 37°C. 
Protein was extracted by adding 10 ^1 phenol-chloroform solution to each reaction, 
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mixing, and centrifiiging to separate the phases. Ten microliters of the aqueous phase 
from each reaction was analyzed by electrophoresis on a 10% polyacrylamide gel. 

The gel was subjected to autoradiography, and the results of this experiment 
are shown in Figure 25. The four left-most lanes show the results of reactions in 
5 which the final dilution of each coupled transcription/translation reaction mixture (in 
the cleavage reaction) was 1/156.25, 1/31.25, 1/12.5 and 1/5, respectively, resulting in 
effective volumes of 0.032, 0.16, 04. and 1 ul, respectively of each coupled 
transcription/translation reaction. The appearance of two DNA fragments having 
lower molecular weights than the starting fragment (lane labeled "uncut control" in 
10 Figure 25) is correlated with increasing amounts of the 287 and 296 zinc finger 
endonucleases in the reaction mixture, showing that DNA cleavage at the expected 
target site was obtained. 

Example 9: Generation of stable cell lines containing an integrated 

1 5 defective eGFP gene 

A DNA fragment encoding the mutated eGFP, eGFPmut, was cleaved out of 
the pCR(R)4-TOPO-GFPmut vector (Example 6) and cloned into the HindlU and 
Noil sites of pcDNA4/TO, thereby placing this gene under control of a tetracycline- 
inducible CMV promoter. The resulting plasmid was named pcDNA4/TO/GFPmut 

20 (Figure 26). T-Rex 293 cells (Invitrogen) were grown in Dulbecco's modified 

Eagle's medium (DMEM) (Invitrogen) supplemented with 10% Tet-free fetal bovine 
serum (FBS) (HyClone). Cells were plated into a 6-well dish at 50% confluence, and 
two wells were each transfected with pcDNA4/TO/GFPmut. The cells were allowed 
to recover for 48 hours, then cells from both wells were combined and split into 

25 I Ox 1 5-cm 2 dishes in selective medium, i.e., medium supplemented with 400 ug/ml 
Zeocin (Invitrogen). The medium was changed every 3 days, and after 10 days single 
colonies were isolated and expanded further. Each clonal line was tested individually 
for doxycycline(dox)-inducible expression of the eGFPmut gene by quantitative RT- 
PCR (TaqMan®). 

30 For quantitative RT-PCR analysis, total RNA was isolated from dox-treated 

and untreated cells using the High Pure Isolation Kit (Roche Molecular 
Biochemicals), and 25 ng of total RNA from each sample was subjected to real time 
quantitative RT-PCR to analyze endogenous gene expression, using TaqMan® assays. 
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Probe and primer sequences are shown in Table 13. Reactions were carried out on an 
ABI 7700 SDS machine (PerkinElmer Life Sciences) under the following conditions. 
The reverse transcription reaction was performed at 48° C for 30 minutes with 
MultiScribe reverse transcriptase (PerkinElmer Life Sciences), followed by a 10- 

5 minute denaturation step at 95°C. Polymerase chain reaction (PCR) was carried out 
' with AmpliGold DNA polymerase (PerkinElmer Life Sciences) for 40 cycles at 95°C 
for 15 seconds and 60°C for 1 minute. Results were analyzed using the SDS version 
1.7 software and are shown in Figure 27, with expression of the eGFPmut gene 
normalized to the expression of the human GAPDH gene. A number of cell lines 

1 0 exhibited doxycycline-dependent expression of eGFP; line 18 (T18) was chosen as a 
model cell line for further studies. 



Table 13: Oligonucleotides for mRNA analysis 



Oligonucleotide 


Sequence 


eGFP primer 1 (5T) 


CTGCTGCCCGACAACCA (SEQ ID NO: 144) 


eGFP primer 2 (3T) 


CCATGTGATCGCGCTTCTC (SEQ ID NO: 1 45) 


eGFP probe 


CCCAGTCCGCCCTGAGCAAAGA (SEQ ID NO: 146) 


GAPDH primer 1 


CCATGTTCGTCATGGGTGTGA (SEQ ID NO: 147) 


GAPDH primer 2 


CATGGACTGTGGTCATGAGT (SEQ ID NO:148) 


GAPDH probe 


TCCTGCACCACCAACTGCTTAGCA (SEQ ID NO: 149) 



15 

Example 10: Generation of a donor sequence for correction of a defective 
chromosomal eGFP gene 

A donor construct containing the genetic information for correcting the 
defective eGFPmut gene was constructed by PCR. The PCR reaction was carried out 

20 as described above, using the peGFP-NI vector as the template. To prevent 

background expression of the donor construct in targeted recombination experiments, 
the first 12 bp and start codon were removed from the donor by PCR using the 
primers GFPnostart and GFP-Xba (sequences provided in Table 14). The resulting 
PCR fragment (734 bp) was cloned into the pCR(R)4-TOPO vector, which does not 

25 contain a mammalian cell promoter, by TOPO-TA cloning to create pCR(R)4-TOPO- 
GFPdonorS (Figure 28). The sequence of the eGFP insert of this construct 
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(corresponding to nucleotides 64-797 of the sequence shown in Figure 22) is shown in 
Figure 29 (SEQ IDNO:20). 



Table 14: Oligonucleotides for construction of donor molecule 



Oligonucleotide 


Sequence 5'-3' 


GFPnostart 


GGCGAGGAGCTGTTCAC (SEQ ID NO: 150) 


GFP-Xba 


GATTATGATCTAGAGTCG (SEQ IDNO:151) 



Example 11: Correction of a mutation in an integrated chromosomal 
eGFP gene by targeted cleavage and recombination 

The Tl 8 stable cell line (Example 9) was transfected with one or both of the 
ZFP-Fokl expression plasmid (pcDNA3.1-GFP287-FokI and pcDNA3.1-GFP296- 

1 0 Fokl, Example 7) and 300 ng of the donor plasmid pCR(R)4-TOPO-GFPdonor5 
(Example 10) using LipofectAMINE 2000 Reagent (Invitrogen) in Opti-MEM I 
reduced serum medium, according to the manufacturer's protocol. Expression of the 
defective chromosomal eGFP gene was induced 5-6 hours after transfection by the 
addition of 2 ng/ml doxycycline to the culture medium. The cells were arrested in the 

1 5 G2 phase of the cell cycle by the addition, at 24 hours post-transfection, of 1 00 ng/ml 
Nocodazole (Figure 30) or 0.2 uM Vinblastine (Figure 31). G2 arrest was allowed to 
continue for 24-48 hours, and was then released by the removal of the medium. The 
cells were washed with PBS and the medium was replaced with DMEM containing 
tetracycline-free FBS and 2 ng/ml doxycycline. The cells were allowed to recover for 

20 24-48 hours, and gene correction efficiency was measured by monitoring the number 
of cells exhibiting eGFP fluorescence, by fluorescence-activated cell sorting (FACS) 
analysis. FACS analysis was carried out using a Beckman-Coulter EPICS XL-MCL 
instrument and System II Data Acquisition and Display software, version 2.0. eGFP 
fluorescence was detected by excitation at 488 nm with an argon laser and monitoring 

25 emissions at 525 nm (x-axis). Background or autofluorescence was measured by 

monitoring emissions at 570 nm (y-axis). Cells exhibiting high fluorescent emission 
at 525 nm and low emission at 570 nm (region E) were scored positive for gene 
correction. 

The results are summarized in Table 15 and Figures 30 and 31. Figures 30 
30 and 3 1 show results in which Tl 8 cells were transfected with the pcDNA3. 1 -GFP287- 
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Fold and pcDNA3.1-GFP296-FokI piasmids encoding ZFP nucleases and the 
pCR(R)4-TOPO-GFPdonor5 plasmid, eGFP expression was induced with 
doxycycline, and cells were arrested in G2 with either nocodazole (Figure 30) or 
vinblastine (Figure 31). Both Figures show FACS traces, in which cells exhibiting 
5 eGFP fluorescence are represented in the lower right-hand portion of the trace 
(identified as Region E, which is the portion of Quadrant 4 underneath the curve). 
For transfected cells that had been treated with nocodazole, 5.35% of the cells 
exhibited GFP fluorescence, indicative of correction of the mutant chromosomal 
eGFP gene (Figure 30), while 6.7% of cells treated with vinblastine underwent eGFP 

10 gene correction (Figure 31). These results are summarized, along with additional 
control experiments, in Rows 1-8 of Table 15. 

In summary, these experiments show that, in the presence of two ZFP 
nucleases and a donor sequence, approximately 1% of treated cells underwent gene 
correction, and that this level of correction was increased 4-5 fold by arresting treated 

1 5 cells in the G2 phase of the cell cycle. 
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Table IS: Correction of a defective chromosomal eGFP gene 



Expt. 


Treatment 1 


T**»f /*<»nf /*aIIc with 

r citcni ceils wiiii 
corrected eGFP gene 2 


1 


300 ng donor only 


a 

0.01 


2 


1 00 ng ZFP 287 + 300 ngdonor 


0.16 


3 


1 00 ng ZFP 296 + 300 ngdonor 


0.6 


4 


50 ng ZFP 287 + 50 ng ZFP 296 + 300 ng donor 


1.2 


5 


as 4 + 100 ng/ml nocodazole 


5.35 


6 


as 4 + 0.2 uM vinblastine 


6.7 


7 


no donor, no ZFP, 100 ng/ml nocodazole 


0.01 


8 


no donor, no ZFP, 0.2 uM vinblastine 


0.0 


9 


100 ng ZFP287/Q486E + 300 ng donor 


0.0 


10 


100 ng ZFP296/E490K + 300 ng donor 


0.01 


U 1 


50 ng 287/Q486E + 50 ng 296/E490K + 300 ng donor 


0.62 


12 I 


as 1 1 + 100 ng/ml nocodazole 


2.37 


13 


as 1 1 + 0.2 uM vinblastine 


2.56 



10 



15 



20 



: T18 cells, containing a defective chromosomal eGFP gene, were transfected with plasmids 
encoding one or two ZFP nucleases and/or a donor plasmid encoding a nondefective eGFP sequence, 
and expression of the chromosomal eGFP gene was induced with doxycycline. Ceils were optionally 
arrested in G2 phase of the cell cycle after eGFP induction. FACS analysis was conducted 5 days after 
transfection. 

2: The number is the percent of total fluorescence exhibiting high emission at 525 nm and low 
emission at 570 nm (region E of the FACS trace). 

Example 12: Correction of a defective chromosomal gene using zinc 
finger nucleases with sequence alterations in the dimerization interface 

Zinc finger nucleases whose sequences had been altered in the dimerization 
interface were tested for their ability to catalyze correction of a defective 
chromosomal eGFP gene. The protocol described in Example 1 1 was used, except 
that the nuclease portion of the ZFP nucleases (i.e., the Fold cleavage half-domains) 
were altered as described in Example 5. Thus, an E490K cleavage half-domain was 
fused to the GFP296 ZFP domain (Table 12), and a Q486E cleavage half-domain was 
fused to the GFP287 ZFP (Table 12). 

The results are shown in Rows 9-1 1 of Table 15 and indicate that a significant 
increase in the frequency of gene correction was obtained in the presence of two ZFP 
nucleases having alterations in their dimerization interfaces, compared to that 
obtained in the presence of either of the nucleases alone. Additional experiments, in 
which Tl 8 cells were transfected with donor plasmid and plasmids encoding the 
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287/Q486E and 296/E490K zinc finger nucleases, then arrested in G2 with 
nocodazole or vinblastine, showed a further increase in frequency of gene correction, 
with over 2% of cells exhibiting eGFP fluorescence, indicative of a corrected 
chromosomal eGFP gene (Table 15, Rows 12 and 13). 

5 

Example 13: Effect of donor length on frequency of gene correction 
In an experiment similar to those described in Example 1 1 , the effect of the 
length of donor sequence on frequency of targeted recombination was tested. TI 8 
cells were transfected with the two ZFP nucleases, and eGFP expression was induced 

1 0 with doxycycline, as in Example 1 1 . Cells were also transfected with either the 
pCR(R)4-TOPO-GFPdonor5 plasmid (Figure 28) containing a 734 bp eGFP insert 
(Figure 29) as in Example 1 1, or a similar plasmid containing a 1527 bp sequence 
insert (Figure 32) homologous to the mutated chromosomal eGFP gene. Additionally, 
the effect of G2 arrest with nocodazole on recombination frequency was assessed. 

15 In a second experiment, donor lengths of 0.7, 1 .08 and 1 .5 kbp were 

compared. Tl 8 cells were transfected with 50 ng of the 287-FokI and 296-FokI 
expression plasmids (Example 7, Table 12) and 500ng of a 0.7 kbp , 1.08 kbp, or 1.5 
kbp donors, as described in Example 11. Four days after transfection, cells were 
assayed for correction of the defective eGFP gene by FACS, monitoring GFP 

20 fluorescence. 

The results of these two experiments, shown in Table 16, show that longer 
donor sequence increases the frequency of targeted recombination (and, hence,- of 
gene correction) and confirm that arrest of cells in the G2 phase of the cell cycle also 
increases the frequency of targeted recombination. 

25 
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Table 16: Effect of donor length and cell-cycle arrest on targeted 
recombination frequency 



Donor length (kb) 


Experiment 1 
Nocodazole concentration: 
Ong/ml lOOng/ml 


Experiment 2 


0.7 


1.41 


5.84 


1.2 


1.08 


not done 


not done 


2.2 


1.5 


2.16 


8.38 


2.3 



Note: Numbers represent percentage of total fluorescence in Region E of the FACS trace (see 
Example 1 1) which is an indication of the fraction of cells that have undergone targeted recombination 
5 to correct the defective chromosomal eGFP gene. 



Example 14: Editing of the endogenous human BL-2Ry gene by targeted 
cleavage and recombination using zinc finger nucleases 

Two expression vectors, each encoding a ZFP-nucIease targeted to the human 
1 0 IL-2Ry gene, were constructed. Each ZFP-nuclease contained a zinc finger protein- 
based DNA binding domain (see Table 17) fused to the nuclease domain of the type 
IIS restriction enzyme Fokl (amino acids 384-579 of the sequence of Wah et ah 
(1998) Proc, Natl. Acad. Set. USA 95:10564-10569) via a four amino acid ZC linker 
(see Example 4). The nucleases were designed to bind to positions in exon 5 of the 
1 5 chromosomal IL-2Ry gene surrounding codons 228 and 229 (a mutational hotspot in 
the gene) and to introduce a double-strand break in the DNA between their binding 
sites. 



Table 17: Zinc Finger Designs for exon 5 of the IL2Ry Gene 



Target sequence 


Fl 


F2 


F3 


F4 


ACTCTGTGGAAG 
(SEQ IDNO:152) 5-8G 


RSDNLSV 

(SEQ ID 
NO:153) 


RNAHRIN 

(SEQ ID 
NO: 154) 


RSDTLSE 
(SEQ ID 
NO: 155) 


ARSTRTN 

(SEQ ID 
NO: 156) 


AAAGCGGCTCCG 
(SEQ ID NO: 157) 5-9D 


RSDTLSE 
(SEQ ID 
NO: 158) 


ARSTRTT 
(SEQID 
NO: 159) 


RSDSLSK 
(SEQ ID 
NO: 1 60) 


QRSNLKV 

(SEQID 
NO:I61) 



20 Note: The zinc finger amino acid sequences shown above (in one-letter code) represent 



residues -1 through +6, with respect to the start of the alpha-helical portion of each zinc finger. Finger 
Fl is closest to the amino terminus of the protein. 

The complete DNA-binding portion of each of the chimeric endonucleases 
25 was as follows: 
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Nuclease targeted to ACTCTGTGGAAG (SEQ ID NO: 1 52) 

MAERPFQCR1CMRNFSRSDNLSVHIRTHTGEKPFACDICGRKFARNAH 
RINHTKIHTGSQKPFQCWCMl^FSRSDTLSErnRTHTGEKPFACDICGRKFAA 
RSTRTNHTKIHLRGS (SEQ ID NO: 162) 

5 

Nuclease targeted to AAAGCGGCTCCG (SEQ ID NO: 1 57) 

MAERPFQCRJCMRNFSRSDTLSEHIRTHTGEKPFACDICGRKFAARSTRTTHTK 
1HTGSQKPFQCR1CMRNFSRSDSLSKHIRTHTGEKPFACDICGRKFAQRSNLKV 
HTK1HLRGS (SEQ ID NO: 163) 

10 

Human embryonic kidney 293 cells were transfected (Lipofectamine 2000; 
Invitrogen) with two expression constructs, each encoding one of the ZFP-nucleases 
described in the preceding paragraph. The cells were also transfected with a donor 
construct carrying as an insert a 1 ,543 bp fragment of the IL2Ry locus corresponding 

15 to positions 69195166-69196708 of the "minus" strand of the X chromosome (UCSC 
human genome release July 2003), in the pCR4Blunt Topo (Invitrogen) vector. The 
IL-2Ry insert sequence contained the following two point mutations in the sequence 
of exon 5 (underlined): 
FRVRSRFNPLCGS (SEQ ID NO:164) 

20 TTTCGTGTTCGGAGCCGGTTTAACCCGCTCTGTGGAAGT (SEQ ID NO: 165) 
The first mutation (CGC->CGG) does not change the amino acid sequence 
(upper line) and serves to adversely affect the ability of the ZFP-nuclease to bind to 
the donor DNA, and to chromosomal DNA following recombination. The second 
mutation (CCA->CCG) does not change the amino acid sequence and creates a 

25 recognition site for the restriction enzyme BsrEl. 

Either 50 or 100 nanograms of each ZFP-nuclease expression construct and 
0.5 or 1 microgram of the donor construct were used in duplicate transfections. The 
following control experiments were also performed: transfection with an expression 
plasmid encoding the eGFP protein; transfection with donor construct only; and 

30 transfection with plasmids expressing the ZFP nucleases only. Twenty four hours 
after transfection, vinblastine (Sigma) was added to 0.2 uM final concentration to one 
sample in each set of duplicates, while the other remained untreated. Vinblastine 
affects the cell's ability to assemble the mitotic spindle and therefore acts as a potent 
G 2 arresting agent. This treatment was performed to enhance the frequency of 

35 targeting because the homology-directed double-stranded break repair pathway is 
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more active than non-homologous end-joining in the G2 phase of the cell cycle. 
Following a 48 hr period of treatment with 0.2 jiM vinblastine, growth medium was 
replaced, and the cells were allowed to recover from vinblastine treatment for an 
additional 24 hours. Genomic DNA was then isolated from all cell samples using the 
5 DNEasy Tissue Kit (Qiagen). Five hundred nanograms of genomic DNA from each 
sample was then assayed for frequency of gene targeting, by testing for the presence 
of a new BsrBl site in the chromosomal IL-2Ry locus, using the assay described 
schematically in Figure 33. 

In brief, 20 cycles of PCR were performed using the primers shown in Table 

10 18, each of which hybridizes to the chromosomal IL-2Ry locus immediately outside 
of the region homologous to the 1 .5 kb donor sequence. Twenty microcuries each of 
<z- 32 P-dCTP and a- 32 P-dATP were included in each PCR reaction to allow detection 
of PCR products. The PCR reactions were desalted on a G-50 column (Amersham), 
and digested for 1 hour with 10 units of BsrBl (New England Biolabs). The digestion 

1 5 products were resolved on a 10% non-denaturing polyacrylamide gel (BioRad), and 
the gel was dried and autoradiographed (Figure 34). In addition to the major PCR 
product, corresponding to the 1 .55 kb amplififed fragment of the IL2Ry locus ("wt" in 
Figure 34), an additional band ("rflp" in Figure 34) was observed in lanes 
corresponding to samples from cells that were transfected with the donor DNA 

20 construct and both ZFP-nuclease constructs. This additional band did not appear in 
any of the control lanes, indicating that ZFP nuclease-facilitated recombination of the 
BsrBl RFLP-containing donor sequence into the chromosome occurred in this 
experiment. 

Additional experiments, in which trace amounts of a RFLP-containing IL-2Ry 
25 DNA sequence was added to human genomic DNA (containing the wild-type JL-2Ry 
gene), and the resultant mixture was amplified and subjected to digestion with a 
restriction enzyme which cleaves at the RFLP, have indicated that as little as 0.5% 
RFLP-containing sequence can be detected quantitatively using this assay. 

30 Table 18: Oligonucleotides for analysis of the human IL-2Ry gene 



Oligonucleotide 



Sequence 
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Ex5_l.5detFl 


G ATTCAACCAGACAG ATAG AAGG (SEQ ID NO: 1 66) 


Ex5_1.5detRl 


TTACTGTCTCATCCTTTACTCC (SEQ ID NO: 1 67) 



Example 15: Targeted recombination at the IL-2Ry locus in KS62 cells 

K562 is a cell line derived from a human chronic myelogenous leukemia. The 
5 proteins used for targeted cleavage were Fok\ fusions to the 5-8G and 5-9D zinc 
finger DNA-binding domains (Example 14, Table 17). The donor sequence was the 
1.5 kbp fragment of the human IL-2Ry gene containing a BsrBl site introduced by 
mutation, described in Example 14. 

K562 cells were cultured in RPM1 Medium 1640 (Invitrogen), supplemented 
10 with 10% fetal bovine serum (FBS) (Hyclone) and 2 mM L-glutamine. All cells were 
maintained at 37°C in an atmosphere of 5% C0 2 . These cells were transfected by 
Nucleofection™ (Solution V, Program T16) (Amaxa Biosystems), according to the 
manufacturers' protocol, transfecting 2 million cells per sample. DNAs for 
transfection, used in various combinations as described below, were a plasmid 
1 5 encoding the 5-8G ZFP-Fok\ fusion endonuclease, a plasmid encoding the 5-9D ZFP- 
Fokl fusion endonuclease, a plasmid containing the donor sequence (described above 
and in Example 14) and the peGFP-Nl vector (BD Biosciences) used as a control. 

In the first experiment, cells were transfected with various plasm ids or 
combinations of plasmids as shown in Table 19. 

• 20 

Table 19 



Sample # 


p-eGFP-Nl 


p5-8G 


p5-9D 


donor 


vinblastine 


1 


5ng 










2 








50 Hg 




3 








50 fig 


yes 


4 




10ng 


lOfig 






5 




5 Hg 


5ng 


25 Hg 




6 




5 Jig 


5 Hg 


25 fig 


yes 


7 




7-5 fig 


7-5 Hg 


25 fig 




8 




7.5 ng 


7.5 fig 


25 fig 


yes 


9 




7.5 fig 


7-5 fig 


50 Hg 




10 




7-5 fig 


7.5 fig 


50 Hg 


yes 
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Vinblastine-treated cells were exposed to 0.2 uM vinblastine at 24 hours after 
transfection for 30 hours. The cells were collected, washed twice with PBS, and re- 
plated in growth medium. Cells were harvested 4 days after transfection for analysis 
of genomic DNA. 

5 Genomic DNA was extracted from the cells using the DNEasy kit (Qiagen). 

One hundred nanograms of genomic DNA from each sample were used in a PCR 
reaction with the following primers: 

Exon 5 forward: GCTAAGGCCAAGAAAGTAGGGCTAAAG (SEQ ID 
NO: 168) 

1 0 Exon 5 reverse: TTCCTTCC ATCACC AA ACCCTCTTG (SEQ ID NO: 1 69) 

These primers amplify a 1,669 bp fragment of the X chromosome 
corresponding to positions 69195100-69196768 on the "-" strand (UCSC human 
genome release July 2003) that contain exon 5 of the IL2Ry gene. Amplification of 
genomic DNA which has undergone homologous recombination with the donor DNA 

1 5 yields a product containing a BsrBl site; whereas the amplification product of 

genomic DNA which has not undergone homologous recombination with donor DNA 
will not contain this restriction site. 

Ten microcuries each of cc- 32 PdCTP and <x- 32 PdATP were included in each 
amplification reaction to allow visualization of reaction products. Following 20 

20 cycles of PCR, the reaction was desalted on a Sephadex G-50 column (Pharmacia), 
and digested with 10 Units of BsrBl (New England Biolabs) for 1 hour at 37°C. The 
reaction was then resolved on a 10% non-denaturing PAGE, dried, and exposed to a 
Phosphorlmager screen. 

The results of this experiment are shown in Figure 35. When cells were 

25 transfected with the control GFP plasmid, donor plasmid alone or the two ZFP- 
encoding plasmids in the absence of donor, no BsrBI site was present in the 
amplification product, as indicated by the absence of the band marked "rflp" in the 
lanes corresponding to these samples in Figure 35. However, genomic DNA of cells 
that were transfected with the donor plasmid and both ZFP-encoding plasmids 

30 contained the BsrB\ site introduced by homologous recombination with the donor 
DNA (band labeled "rflp"). Quantitation of the percentage of signal represented by 
the RFLP-containing DNA, shown in Figure 35, indicated that, under optimal 
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conditions, up to 18% of all IL-2Ry genes in the transfected cell population were 
altered by homologous recombination. 

A second experiment was conducted according to the protocol just described, 
except that the cells were expanded for 10 days after transfection. DNAs used for 
5 transfection are shown in Table 20. 



Table 20 



Sample # 


p-eGFP-Nl 


P5-8G 


p5-9D 


donor 


vinblastine 


1 


50 ug 










2 








50 pg 




3 








50 pg 


yes 


4 




7-5 K g 


7.5 jig 






5 




5 H 


5 fig 


25 pg 




6 




5 Jig 


5ug 


25 pg 


yes 


7 




7.5 ug 


7.5 ug 


50 pg 




8 




7.5 ug 


7.5 pg 


50 pg 


yes 



Analysis of BsrBl digestion of amplified DNA, shown in Figure 36, again 
10 demonstrated that up to 18% of IL-2Ry genes had undergone sequence alteration 

through homologous recombination, after multiple rounds of cell division. Thus, the 
targeted recombination events are stable. 

In addition, DNA from transfected cells in this second experiment was 
analyzed by Southern blotting. For this analysis, twelve micrograms of genomic 
1 5 DNA from each sample were digested with 100 units EcoRI, 50 units BsrBl, and 40 
units of Dpn\ (all from New England Biolabs) for 12 hours at 37°C. This digestion 
generates a 7.7 kbp Eco RI fragment from the native IL-2Ry gene (lacking a BsrBl 
site) and fragments of 6.7 and 1 .0 kbp from a chromosomal !L-2Ry gene whose 
sequence has been altered, by homologous recombination, to include the BsrBl site. 
20 Dpnl 9 a methylation-dependent restriction enzyme, was included to destroy the dam- 
methylated donor DNA. Unmethylated K562 cell genomic DNA is resistant to Dpnl 
digestion. 

Following digestion, genomic DNA was purified by phenol-chloroform 
extraction and ethanol precipitation, resuspended in TE buffer, and resolved on a 
25 0.8% agarose gel along with a sample of genomic DNA digested with EcoRI and SphI 
to generate a size marker. The gel was processed for alkaline transfer following 
standard procedure and DNA was transferred to a nylon membrane (Schleicher and 
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Schuell). Hybridization to the blot was then performed by using a radiolabelled 
fragment of the IL-2Ry locus corresponding to positions 69198428-69198769 of the 

strand of the X chromosome (UCSC human genome July 2003 release). This 
region of the gene is outside of the region homologous to donor DNA. After 
5 hybridization, the membrane was exposed to a Phosphorlmager plate and the data 
quantitated using Molecular Dynamics software. Alteration of the chromosomal IL- 
2Ry sequence was measured by analyzing the intensity of the band corresponding to 
the EcoRl-BsrBl fragment (arrow next to autoradiograph; BsrBl site indicated by 
filled triangle in the map above the autoradiograph). 
10 The results, shown in Figure 37, indicate up to 15% of chromosomal lL-2Ry 

sequences were altered by homologous recombination, thereby confirming the results 
obtained by PCR analysis that the targeted recombination event was stable through 
multiple rounds of cell division. The Southern blot results also indicate that the 
results shown in Figure 36 do not result from an amplification artifact. 

15 

Example 16: Targeted recombination at the IL-2Ry locus in CD34- 
positive hematopoietic stem cells 

Genetic diseases (e.g., severe combined immune deficiency (SCID) and sickle 
cell anemia) can be treated by homologous recombination-mediated correction of the 

20 specific DNA sequence alteration responsible for the disease. In certain cases, 
maximal efficiency and stability of treatment would result from correction of the 
genetic defect in a pluripotent cell. To this end, this example demonstrates alteration 
of the sequence of the TL-2Ry gene in human CD34-positive bone marrow cells. 
CD34 + cells are pluripotential hematopoietic stem cells which give rise to the 

25 erythroid, myeloid and lymphoid lineages. 

Bone marrow-derived human CD34 cells were purchased from AllCells, LLC 
and shipped as frozen stocks. These cells were thawed and allowed to stand for 2 
hours at 37°C in an atmosphere of 5% C0 2 in RPMI Medium 1640 (Invitrogen), 
supplemented with 10% fetal bovine serum (FBS) (Hyclone) and 2 mM L-glutamine. 

30 Cell samples (1x1 0 6 or 2xl0 6 cells) were transfected by Nucleofection™ (amaxa 
biosystems) using the Human CD34 Cell Nucleofector™ Kit, according to the 
manufacturers' protocol. After transfection, cells were cultured in RPMI Medium 
1640 (Invitrogen), supplemented with 10% FBS, 2 mM L-glutamine, lOOng/ml 
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granulocyte-colony stimulating factor (G-CSF), lOOng/ml stem cell factor (SCF), 
lOOng/ml thrombopoietin (TPO), 50ng/ml Flt3 Ligand, and 20ng/ml Interleukin-6 (IL- 
6). The caspase inhibitor zVAD-FMK (Sigma-Aldrich) was added to a final 
concentration of 40 uM in the growth medium immediately after transfection to block 
apoptosis. Additional caspase inhibitor was added 48 hours later to a final 
concentration of 20 uM to further prevent apoptosis. These cells were maintained at 
37°C in an atmosphere of 5% C0 2 and were harvested 3 days post-transfection. 
Cell numbers and DNAs used for transfection are shown in Table 21 . 



Sample 


# cells 


p-eGFP- 
Nl l 


Donor 2 


p5-8G 3 


p5-9D 3 


1 


IxlO 6 


5 ug 








2 


2x!0 6 




50 ug 






3 


2xl0 6 




50 ug 


7.5 ug 


7.5 ug 



10 



15 



20 



2. The donor DNA is a 1 .5 kbp fragment containing sequences from exon 5 of the IL-2Ry 
gene with an introduced BsrBl site (see Example 14). 

3. These are piasmids encoding Fok\ fusions with the 5-8 G and 5-9D zinc finger DNA 
binding domains (see Table 17). 

Genomic DNA was extracted from the cells using the MasterPure DNA 
Purification Kit (Epicentre). Due to the presence of glycogen in the precipitate, 
accurate quantitation of this DNA used as input in the PCR reaction is impossible; 
estimates using analysis of ethidium bromide-stained agarose gels indicate that ca. 50 
ng genomic DNA was used in each sample. Thirty cycles of PCR were then 
performed using the following primers, each of which hybridizes to the chromosomal 
IL-2Ry locus immediately outside of the region homologous to the 1 .5 kb donor: 
ex5_1.5detF3 GCTAAGGCCAAGAAAGTAGGGCTAAAG (SEQ ID NO: 170) 
ex5 1.5detR3 TTCCTTCCATCACCAAACCCTCTTG (SEQ ID NO:171) 



Twenty microcuries each of <x- 32 PdCTP and a- 32 PdATP were included in each 
25 PCR reaction to allow detection of PCR products. To provide an in-gel quantitation 
reference, the existence of a spontaneously occurring SNP in exon 5 of the IL- 
2Rgamma gene in Jurkat cells was exploited: this SNP creates a RFLP by destroying 
a Maell site that is present in normal human DNA. A reference standard was 
therefore created by adding I or 10 nanograms of normal human genomic DNA 
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(obtained from Clontech, Palo Alto, CA) to 100 or 90 ng of Jurkat genomic DNA, 
respectively, and performing the PCR as described above. The PCR reactions were 
desalted on a G-50 column (Amersham), and digested for 1 hour with restriction 
enzyme: experimental samples were digested with 10 units of BsrBl (New England 
5 Biolabs); the "reference standard" reactions were digested with MaeW. The digestion 
products were resol ved on a 10% non-denaturing PAGE (BioRad), the gel dried and 
analyzed by exposure to a Phosphorlmager plate (Molecular Dynamics). 

The results are shown in Figure 38. In addition to the major PCR product, 
corresponding to the 1 .6 kb fragment of the lL2Ry locus ("wt" in the right-hand panel 

10 of Figure 38), an additional band (labeled "rflp") was observed in lanes corresponding 
to samples from cells that were transfected with plasmids encoding both ZFP- 
nucleases and the donor DNA construct. This additional band did not appear in the 
control lanes, consistent with the idea that ZFP-nuclease assisted gene targeting of 
exon 5 of the common gamma chain gene occurred in this experiment. 

1 5 Although accurate quantitation of the targeting rate is compl icated by the 

proximity of the RFLP band to the wild-type band; the targeting frequency was 
estimated, by comparison to the reference standard (left panel), to be between 1-5%. 

Example 17: Donor-target homology effects 

20 The effect, on frequency of homologous recombination, of the degree of 

homology between donor DNA and the chromosomal sequence with which it 
recombines was examined in T18 cell line, described in Example 9. This line 
contains a chromosomal ly integrated defective eGFP gene, and the donor DNA 
contains sequence changes, with respect to the chromosomal gene, that correct the 

25 defect. 

Accordingly, the donor sequence described in Example 10 was modified, by 
PCR mutagenesis, to generate a series of -700 bp donor constructs with different 
degrees of non-homology to the target. All of the modified donors contained 
sequence changes that corrected the defect in the chromosomal eGFP gene and 
30 contained additional silent mutations (DNA mutations that do not change the 
sequence of the encoded protein) inserted into the coding region surrounding the 
cleavage site. These silent mutations were intended to prevent the binding to, and 
cleavage of, the donor sequence by the zinc finger-cleavage domain fusions, thereby 
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reducing competition between the intended chromosomal target and the donor 
plasmid for binding by the chimeric nucleases. In addition, following homologous 
recombination, the ability of the chimeric nucleases to bind and re-cleave the newly- 
inserted chromosomal sequences (and possibly stimulating another round of 

5 recombination, or causing non-homologous end joining or other double-strand break- 
driven alterations of the genome) would be minimized. 

Four different donor sequences were tested. Donor 1 contains 8 mismatches 
with respect to the chromosomal defective eGFP target sequence, Donor 2 has 10 
mismatches, Donor 3 has 6 mismatches, and Donor 5 has 4 mismatches. Note that the 

10 sequence of donor 5 is identical to wild-type eGFP sequence, but contains 4 

mismatches with respect to the defective chromosomal eGFP sequence in the T18 cell 
line. Table 22 provides the sequence of each donor between nucleotides 201-242. 
Nucleotides that are divergent from the sequence of the defective eGFP gene 
integrated into the genome of the Tl 8 cell line are shown in bold and underlined. The 

1 5 corresponding sequences of the defective chromosomal eGFP gene (GFP mut) and the 
normal eGFP gene (GFP wt) are also shown. 



Table 22 



Donor 


Sequence 


SEQ 
ID NO. 


Donor 1 


CTTCAGCCGCTATCCAGACCACATGAAACAACACGACTTCTT 


172 


Donor2 


CTTCAGCCGGTATCCAGACCACATGAAACAACATGACTTCTT 


173 


Donor3 


CTTCAGCCGCTACCCAGACCACATGAAACAGCACGACTTCTT 


174 


Donor 5 


CTTCAGCCGCTACCCCGACCACATGAAGCAGCACGACTTCTT 


175 


GFP mut 


CTTCAGCCGCTACCCCTAACAC — GAAGCAGCACGACTTCTT 


176 


GFP wt 


CTTCAGCCGCTACCCCGACCACATGAAGCAGCACGACTTCTT 


177 



20 The Tl 8 cell line was transfected, as described in Example 1 1, with 50 ng of 

the 287-FoM and 296-Fokl expression constructs (Example 7 and Table 12) and 500 
ng of each donor construct. FACS analysis was conducted as described in Example 
11. 

The results, shown in Table 23, indicate that a decreasing degree of mismatch 
25 between donor and chromosomal target sequence (i.e., increased homology) results in 
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an increased frequency of homologous recombination as assessed by restoration of 
GFP function. 

Table 23 1 



Donor 


# mismatches 


Percent cells with 
corrected eGFP gene 2 


Donor 2 


10 


0.45% 


Donor I 


8 


0.53% 


Donor 3 


6 


0.89% 


Donor 5 


4 


1.56% 



1: T18 cells, containing a defective chromosomal eGFP gene, were transfected with plasmids 
5 encoding two ZFP nucleases and with donor plasmids encoding a nondefective eGFP sequence having 
different numbers of sequence mismatches with the chromosomal target sequence. Expression of the 
chromosomal eGFP gene was induced with doxycycline and FACS analysis was conducted 5 days after 
transfection. 

2: The number is the percent of total fluorescence exhibiting high emission at 525 nm and low 
1 0 emission at 570 nm (region E of the FACS trace). 

The foregoing results show that levels of homologous recombination are 
increased by decreasing the degree of target-donor sequence divergence. Without 
wishing to be bound by any particular theory or to propose a particular mechanism, it 

15 is noted that greater homology between donor and target could facilitate homologous 
recombination by increasing the efficiency by which the cellular homologous 
recombination machinery recognizes the donor molecule as a suitable template. 
Alternatively, an increase in donor homology to the target could also lead to cleavage 
of the donor by the chimeric ZFP nucleases. A cleaved donor could help facilitate 

20 homologous recombination by increasing the rate of strand invasion or could aid in 
the recognition of the cleaved donor end as a homologous stretch of DNA during 
homology search by the homologous recombination machinery. Moreover, these 
possibilities are not mutually exclusive. 

25 Example 18: Preparation of siRNA 

To test whether decreasing the cellular levels of proteins involved in non- 
homologous end joining (NHEJ) facilitates targeted homologous recombination, an 
experiment in which levels of the Ku70 protein were decreased through siRNA 
inhibition was conducted. siRNA molecules targeted to the Ku70 gene were 

30 generated by transcription of Ku70 cDNA followed by cleavage of double-stranded 
transcript with Dicer enzyme. 
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Briefly, a cDNA pool generated from 293 and U20S cells was used in five 
separate amplification reactions, each using a different set of amplification primers 
specific to the Ku70 gene, to generate five pools of cDNA fragments (pools A-E), 
ranging in size from 500-750 bp. Fragments in each of these five pools were then re- 
5 amplified using primers containing the bacteriophage T7 RNA polymerase promoter 
element, again using a different set of primers for each cDNA pool. cDN A generation 
and PCR reactions were performed using the Superscript Choice cDNA system and 
Platinum Taq High Fidelity Polymerase (both from Invitrogen, Carlsbad, CA), 
according to manufacturers protocols and recommendations. 

1 0 Each of the amplified DN A pools was then transcribed in vitro with 

bacteriophage T7 RNA polymerase to generate five pools (A-E) of double stranded 
RNA (dsRNA), using the RNAMAXX in vitro transcription kit (Stratagene, San 
Diego, CA) according to the manufacturer's instructions. After precipitation with 
ethanol, the RNA in each of the pools was resuspended and cleaved in vitro using 

1 5 recombinant Dicer enzyme (Stratagene, San Diego, CA) according to the 

manufacturer's instructions. 21-23 bp siRNA products in each of the five pools were 
purified by a two-step method, first using a Microspin G-25 column (Amershan), 
followed by a Microcon YM-100 column (Amicon). Each pool of siRNA products 
was transiently transfected into the T7 cell line using Lipofectamone 2000®. 

20 Western blots to assay the relative effectiveness of the siRNA pools in 

suppressing Ku70 expression were performed approximately 3 days post-transfection. 
Briefly, cells were lysed and disrupted using RIPA buffer (Santa Cruz 
Biotechnology), and homogenized by passing the lysates through a QIAshredder 
(Qiagen, Valencia, CA). The clarified lysates were then treated with SDS PAGE 

25 sample buffer (with 0 mercaptoethanol used as the reducing agent) and boiled for 5 
minutes. Samples were then resolved on a 4-12% gradient NUPAGE gel and 
transferred onto a PVDF membrane. The upper portion of the blot was exposed to an 
anti-Ku70 antibody (Santa Cruz sc-5309) and the lower portion exposed to an anti-TF 
IIB antibody (Santa Cruz sc-225, used as an input control). The blot was then 

30 exposed to horseradish peroxidase-conjugated goat anti-mouse secondary antibody 
and processed for electrochemi luminescent (ECL) detection using a kit from Pierce 
Chemical Co. according to the manufacturer's instructions. 
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Figure 39 shows representative results following transfection of two of the 
siRNA pools (pools D and E) into T7 cells. Transfection with 70 ng of siRNA E 
results in a significant decrease in Ku70 protein levels (Figure 39, lane 3). 

5 Example 19: Increasing the Frequency of Homologous Recombination by 

Inhibition of Expression of a Protein Involved in Non-Homologous End Joining 

Repair of a double-stranded break in genomic DNA can proceed along two 
different cellular pathways; homologous recombination (HR) or non-homologous end 
joining (NHEJ). Ku70 is a protein involved in NHEJ, which binds to the free DNA 

10 ends resulting from a double-stranded break in genomic DNA. To test whether 

lowering the intracellular concentration of a protein involved in NHEJ increases the 
frequency of HR, small interfering RNAs (siRNAs), prepared as described in 
Example 18, were used to inhibit expression of Ku70 mRNA, thereby lowering levels 
of Ku70 protein, in cells co-transfected with donor DNA and with plasmids encoding 

1 5 chimeric nucleases. 

For these experiments, the T7 cell line (see Example 9 and Figure 27) was 
used. These cells contain a chromosomally-integrated defective eGFP gene, but have 
been observed to exhibit lower levels of targeted homologous recombination than the 
T18 cell line used in Examples 11-13. 

20 T7 cells were transfected, as described in Example 1 1 , with either 70 or 140 

ng of one of two pools of dicer product targeting Ku70 (see Example 18). Protein 
blot analysis was performed on extracts derived from the transfected cells to 
determine whether the treatment of cells with siRNA resulted in a decrease in the 
levels of the Ku70 protein (see previous Example). Figure 39 shows that levels of the 

25 Ku70 protein were reduced in cells that had been treated with 70 ng of siRNA from 
pool E. 

Separate cell samples in the same experiment were co-transfected with 70 or 
140 ng of siRNA (pool D or pool E) along with 50 ng each of the 287-FokJ and 296- 
Fokl expression constructs (Example 7 and Table 12) and 500 ng of the 1 .5 kbp GFP 
30 donor (Example 1 3), to determine whether lowering Ku70 levels increased the 

frequency of homologous recombination. The experimental protocol is described in 
Table 24. Restoration of eGFP activity, due to homologous recombination, was 
assayed by FACS analysis as described in Example 1 i. 
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Table 24 



Expt. n 


Donor 1 


ZFNs 2 


SiRNA 3 


% correction 4 


1 


500 ng 






0.05 


2 




50 ng each 




0.01 


3 


500 ng 


50 ng each 






4 


500 ng 


50 ng each 


70 ng pool D 


0.68 


5 


500 ng 


50 ng each 


140 ng pool D 


0.59 


6 


500 ng 


50 ng each 


70 ng pool E 


1.25 


7 


500 ng 


50 ng each 


140 ng poolE 


0.92 



1. A plasm id containing a 1.5 kbp sequence encoding a functional eGFP protein which is 



homologous to the chromosomal ly integrated defective eGFP gene 
5 2. Plasmids encoding the eGFP-targeted 287 and 296 zinc finger protein/FoAI fusion 

endonucleases 

3. See Example 1 8 

4. Percent of total fluorescence exhibiting high emission at 525 nm and low emission at 570 
nm (region E of the FACS trace, see Example 1 1). 

10 

The percent correction of the defective eGFP gene in the transfected T7 cells 
(indicative of the frequency of targeted homologous recombination) is shown in the 
right-most column of Table 24. The highest frequency of targeted recombination is 
observed in Experiment 6, in which cells were transfected with donor DNA, plasmids 
1 5 encoding the two eGFP-targeted fusion nucleases and 70 ng of siRNA Pool E. 
Reference to Example 18 and Figure 39 indicates that 70 ng of Pool E siRNA 
significantly depressed Ku70 protein levels. Thus, methods that reduce cellular levels 
of proteins involved in NHEJ can be used as a means of facilitating homologous 
recombination. 

20 

Example 20: Zinc finger-Fortl fusion nucleases targeted to the human P- 
globin gene 

A number of four-finger zinc finger DNA binding domains, targeted to the 
human p-globin gene, were designed and plasmids encoding each zinc finger domain, 
25 fused to a Fokl cleavage half-domain, were constructed. Each zinc finger domain 
contained four zinc fingers and recognized a 12 bp target site in the region of the 
human P-globin gene encoding the mutation responsible for Sickle Cell Anemia. The 
binding affinity of each of these proteins to its target sequence was assessed, and four 
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proteins exhibiting strong binding (sca-r29b, sca-36a, sca-36b, and sca-36c) were 
used for construction of Fok\ fusion endonucleases. 

The target sites of the ZFP DNA binding domains, aligned with the sequence 
of the human fJ-globin gene, are shown below. The translational start codon (ATG) is 
5 in bold and underlined, as is the A-T substitution causing Sickle Cell Anemia. 

sca-36a GAAGTCTGCCGT (SEQ ID NO: 178) 
sca-36b GAAGTCtGCCGTT {SEQ ID 

N0:179) 

10 sca-36c GAAGTCtGCCGTT (SEQ ID 

NO:180) 

CAAACAGACACCATGGTGCATCTGACTCCTGTGGAGAAGTCTGCCGTTACTG 
GTTTGTCTGTGGTACCACGTAGACTGAGGACACCTCTTCAGACGGCAATGAC ( SEQ I D 

N0:181) 

15 sca-r29b ACGTAGaCTGAGG (SEQ ID NO: 182) 

Amino acid sequences of the recognition regions of the zinc fingers in these four 
proteins are shown in Table 25. The complete amino acid sequences of these zinc finger 
domains are shown in Figure 40. The sca-36a domain recognizes a target site having 12 

20 contiguous nucleotides (shown in upper case above), while the other three domain recognize a 
thirteen nucleotide sequence consisting of two six-nucleotide target sites (shown in upper case) 
separated by a single nucleotide (shown in lower case). Accordingly, the sca-r29b, sca-36b and 
sca-36c domains contain a non-canonical inter-finger linker having the amino acid sequence 
TGGGGSQKP (SEQ ID NO: 183) between the second and the third of their four fingers. 

25 Tabic 25 



ZFP 


Fl 


F2 


F3 


F4 


sca-r29b 


QSGDLTR 
(SEQ ID NO: 184) 


TSANLSR 
(SEQ ID NO: 185) 


DRSALSR 
(SEQ ID NO: 186) 


QSGHLSR 
(SEQ ID NO: 187) 


sca-36a 


RSQTRKT 
(SEQ ID NO: 188) 


QKRNRTK 
(SEQ ID NO: 189) 


DRSALSR 
(SEQ ID NO: 190) 


QSGNLAR 
(SEQ ID NO: 191) 


sca-36b 


TSGSLSR 
(SEQ ID NO: 192) 


DRSDLSR 
(SEQ ID NO: 193) 


DRSALSR 
(SEQ ID NO: 194) 


QSGNLAR 
(SEQ ID NO: 195) 


sca-36c 


TSSSLSR 
(SEQ ID NO: 196) 


DRSDLSR 
(SEQ ID NO: 197) 


DRSALSR 
(SEQ ID NO: 198) 


QSGNLAR 
(SEQ ID NO: 199) 



Example 21: In vitro cleavage of a DNA target sequence by p-globin- 
targeted ZFP/FoM fusion endonucleases 

Fusion proteins containing a Fok\ cleavage half-domain and one the four ZFP 
30 DNA binding domains described in the previous example were tested for their ability 
to cleave DNA in vitro with the predicted sequence specificity. These ZFP domains 
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were cloned into the pcDNA3.1 expression vector via Kpnl and BamHI sites and 
fused in-frame to the Fok\ cleavage domain via a 4 amino acid ZC linker, as described 
above. A DNA fragment containing 700 bp of the human p-globin gene was cloned 
from genomic DNA obtained from K562 cells. The isolation and sequence of this 
5 fragment was described in Example 3,supra. 

To produce fusion endonucleases (ZFNs) for the in vitro assay, circular 
plasmids encoding Fokl fusions to sca-r29b, sca-36a, sca-36b, and sca-36c protein 
were incubated in an in vitro transcription/translation system. See Example 4. A total 
of 2 ul of the TNT reaction (2 ul of a single reaction when a single protein was being 

10 assayed or 1 ul of each reaction when a pair of proteins was being assayed) was added 
to 13 ul of the cleavage buffer mix and 3 ul of labeled probe (-1 ng/ul). The probe 
was end-labeled with 32 P using polynucleotide kinase. This reaction was incubated 
for I hour at room temperature to allow binding of the ZFNs. Cleavage was 
stimulated by the addition of 8 ul of 8 mM MgCl 2> diluted in cleavage buffer, to a 

15 final concentration of approximately 2.5 mM. The cleavage reaction was incubated 
for 1 hour at 37°C and stopped by the addition of 1 1 ul of phenol/chloroform. The 
DNA was isolated by phenol/chloroform extraction and analyzed by gel 
electrophoresis, as described in Example 4. As a control, 3 ul of probe was analyzed 
on the gel to mark the migration of uncut DNA (labeled "U" in figure 41). 

20 The results are shown in Figure 41 . Incubation of the target DNA with any 

single zinc finger/Fotf fusion resulted in no change in size of the template DNA. 
However, the combination of the sca-r29b nuclease with either of the sca-36b or sca- 
36c nucleases resulted in cleavage of the target DNA, as evidenced by the presence of 
two shorter DNA fragments (rightmost two lanes of Figure 41). 

25 

Example 22: ZFV/Fokl fusion endonucleases, targeted to the p-globin 
gene, tested in a chromosomal GFP reporter system 

A DNA fragment containing the human 3-globin gene sequence targeted by 
the ZFNs described in Example 20 was synthesized and cloned into a Spel site in an 
30 eGFP reporter gene thereby, disrupting eGFP expression. The fragment contained the 
following sequence, in which the nucleotide responsible for the sickle cell mutation is 
in bold and underlined): 
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CTAGACACCATGGTGCATCTGACTCCTGTGGAGAAGTCTGCCGTTA 
CTGCCCTAG (SEQ ID NO:200) 

This disrupted eGFP gene containing inserted P-globin sequences was cloned 
into pcDNA4/TO (Invitrogen, Carlsbad, CA) using the Hindlll and Noil sites, and the 
5 resulting vector was transfected into HEK293 TRex cells (Invitrogen). Individual 
stable clones were isolated and grown up, and the clones were tested for targeted 
homologous recombination by transfecting each of the sca-36 proteins (sca-36a, sca- 
36b, sca-36c) paired with sca-29b (See Example 20 and Table 25 for sequences and 
binding sites of these chimeric nucleases). Cells were transfected with 50 ng of 
10 plasmid encoding each of the ZFNs and with 500 ng of the 1.5-kb GFP Donor 
(Example 1 3). Five days after transfection, cells were tested for homologous 
recombination at the inserted defective eGFP locus. Initially, cells were examined by 
fluorescence microscopy for eGFP function. Cells exhibiting fluorescence were then 
analyzed quantitatively using a FACS assay for eGFP fluorescence, as described in 
15 Example 11. 

The results showed that all cell lines transfected with sca-29b and sca-36a 
were negative for eGFP function, when assayed by fluorescence microscopy. Some 
of the lines transfected with sca-29b paired with either sca-36b or sca-36c were 
positive for eGFP expression, when assayed by fluorescence microscopy, and were 
20 therefore further analyzed by FACS analysis. The results of FACS analysis of two of 
these lines are shown in Table 26, and indicate that zinc finger nucleases targeted to 
P-globin sequences-are capable of catalyzing sequence-specific double-stranded DNA 
cleavage to facilitate homologous recombination in living cells. 

25 
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Table 26 



Cell line 


DNA transfected: 
sca-29b sca-36a sca-36b sca-36c 


% corn 1 


#20 


+ 


+ 






0 


+ 




+ 




0.08 


+ 






+ 


0.07 


#40 


+ 


+ 






0 


+ 




+ 




0.18 


+ 






+ 


0.12 



1. Percent of total fluorescence exhibiting high emission at 525 nm and low emission at 570 



nm (region E of the FACS trace, see Example 11). 

5 Example 23: Effect of transcription level on targeted homologous 

recombination 

Since transcription of a chromosomal DNA sequence involves alterations in its 
chromatin structure (generally to make the transcribed sequences more accessible), it 
is possible that an actively transcribed gene might be a more favorable substrate for 

10 targeted homologous recombination. This idea was tested using the T18 cell line 

(Example 9) which contains chromosomal sequences encoding a defective eGFP gene 
whose transcription is under the control of a doxycycline-inducible promoter. 

Separate samples of Tl 8 cells were transfected with plasmids encoding the 
eGFP-targeted 287 and 296 zinc finger/Fo*I fusion proteins (Example 7) and a 1.5 

1 5 kbp donor DNA molecule containing sequences that correct the defect in the 

chromosomal eGFP gene (Example 9). Five hours after transfection, transfected cells 
were treated with different concentrations of doxycycline, then eGFP mRNA levels 
were measured 48 hours after addition of doxycycline. eGFP fluorescence at 520 nm 
(indicative of targeted recombination of the donor sequence into the chromosome to 

20 replace the inserted p-globin sequences) was measured by FACS at 4 days after 
transfection. 

The results are shown in Figure 42. Increasing steady-state levels of eGFP 
mRNA normalized to GAPDH mRNA (equivalent, to a first approximation, to the 
rate of transcription of the defective chromosomal eGFP gene) are indicated by the 
25 bars. The number above each bar indicate the percent of cells exhibiting eGFP 

fluorescence. The results show that increasing transcription rate of the target gene is 
accompanied by higher frequencies of targeted recombination. This suggests that 



120 



WO 2005/014791 



PCT/US2004/025407 



targeted activation of transcription (as disclosed, e.g. in co-owned U.S. Patents 
6,534,261 and 6,607,882) can be used, in conjunction with targeted DNA cleavage, to 
stimulate targeted homologous recombination in cells. 

5 Example 24; Generation of a cell line containing a mutation in the IL- 

2Ry gene 

K562 cells were transfected with plasmids encoding the 5-8GLO and the 5- 
9DL0 zinc finger nucleases (ZFNs) (see Example 14; Table 17) and with a 1 .5 kbp 
Dral donor construct. The Dral donor is comprised of a sequence with homology to 
10 the region encoding the 5 th exon of the IL2Ry gene, but inserts an extra base between 
the ZFN-binding sites to create a frameshift and generate a Dral site. 

24 hours post-transfection, cells were treated with 0.2 uM vinblastine (final 
concentration) for 30 hours. Cells were washed three times with PBS and re-plated in 
medium. Cells were allowed to recover for 3 days and an aliquot of cells were 
1 5 removed to perform a PCR-based RFLP assay, similar to that described in Example 
14, testing for the presence of a Dral site. It was determined the gene correction 
frequency within the population was approximately 4%. 

Cells were allowed to recover for an additional 2 days and 1600 individual 
cells were plated into 40x 96-well plates in 100 ul of medium. 
20 The cells are grown for about 3 weeks, and cells homozygous for the Dral 

mutant phenotype are isolated. The cells are tested for genome modification (by 
testing for the presence of a Dral site in exon 5 of the IL-2Ry gene) and for levels of 
IL-2Ry mRNA (by real-time PCR) and protein (by Western blotting) to determine the 
effect of the mutation on gene expression. Cells are tested for function by FACS 
25 analysis. 

Cells containing the Dral frameshift mutation in the IL-2Ry gene are 
transfected with plasmids encoding the 5-8GL0 and 5-9DL0 fusion proteins and a 1.5 
kb BsrB\ donor construct (Example 14) to replace the Dral frameshift mutation with a 
sequence encoding a functional protein. Levels of homologous recombination greater 
30 than 1 % are obtained in these cells, as measured by assaying for the presence of a 

BsrBl site as described in Example 14. Recovery of gene function is demonstrated by 
measuring mRNA and protein levels and by FACS analysis. 
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All patents, patent applications and publications mentioned herein are hereby 
incorporated by reference, in their entireties, for all purposes. 

Although disclosure has been provided in some detail by way of illustration 
and example for the purposes of clarity of understanding, it will be apparent to those 
5 skilled in the art that various changes and modifications can be practiced without 
departing from the spirit or scope of the disclosure. Accordingly, the foregoing 
descriptions and examples should not be construed as limiting. 
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CLAIMS 

What is claimed is: 

1. A method for cleaving cellular chromatin in a region of interest, the 
method comprising: 

5 (a) selecting a first nucleotide sequence in the region of interest; 

(b) engineering a first zinc finger binding domain to bind to the first sequence; 

and 

(c) expressing a first fusion protein in the cell, the first fusion protein 
comprising the engineered zinc finger binding domain and a cleavage domain; 

1 0 wherein the first fusion protein binds to the first nucleotide sequence and the 

cellular chromatin is cleaved in the region of interest. 

2. The method of claim I , wherein the cleavage domain comprises two 
cleavage half-domains. 

15 

3. The method of claim 2 wherein the cleavage half-domains are derived 
from the same endonuclease. 

4. The method of claim 3, wherein the cleavage half domains are derived 
20 from a Type J IS restriction endonuclease. 

5. The method of claim 4, wherein the Type IIS restriction endonuclease 
is Fok I. 

25 6. The method of claim 1 , wherein the first fusion protein comprises a first 

cleavage half-domain and the method further comprises: 

expressing a second fusion protein in the cell, the second fusion protein 

comprising: 

(i) a second zinc finger binding domain and 
30 (ii) a cleavage half-domain; 

wherein the second fusion protein binds to a second nucleotide 
sequence located between 2 and 50 nucleotides from the first nucleotide sequence. 
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7. The method of claim 6, wherein cleavage occurs between the first and 
second nucleotide sequences. 

8. The method of claim 6 wherein the second zinc finger binding domain 
5 is engineered to bind to the second nucleotide sequence. 

9. The method of claim 6 wherein the cleavage half-domains of the first 
and second fusion proteins are from the same endonuclease. 

10 10. The method of claim 9 wherein the endonuclease is a Type IIS 

restriction endonuclease. 

1 L The method of claim 10 wherein the Type IIS restriction endonuclease 

is Fok I 

15 

12. . A method for cleaving cellular chromatin in a region of interest, the 
method comprising: 

(a) selecting the region of interest; 

(b) engineering a first zinc finger binding domain to bind to a first nucleotide 
20 sequence in the region of interest; 

(c) providing a second zinc finger binding domain which binds to a second 
nucleotide sequence in the region of interest, wherein the second sequence is located 
between 2 and 50 nucleotides from the first sequence; 

(d) expressing a first fusion protein in the cell, the first fusion protein 

25 comprising the first zinc finger binding domain and a first cleavage half-domain; and 

(e) expressing a second fusion protein in the cell, the second fusion protein 
comprising the second zinc finger binding domain and a second cleavage half domain; 

wherein the first fusion protein binds to the first nucleotide sequence, and the 
second fusion protein binds to the second nucleotide sequence, and further wherein 
30 said binding positions the cleavage half-domains such that the cellular chromatin is 
cleaved in the region of interest. 
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13. The method of claim 12, wherein cleavage occurs between the first and 
second nucleotide sequences. 

14. The method of claim 12 wherein the second zinc finger binding 
5 domain is engineered to bind to the second nucleotide sequence. 

15. The method of claim 12 wherein the first and second cleavage half- 
domains are from the same endonuclease. 

10 16. The method of claim 1 5 wherein the endonuclease is a Type IIS 

restriction endonuclease. 

17. The method of claim 16 wherein the Type ITS restriction endonuclease 
is Fok I. 

15 

18. The method of claim 12 wherein the cellular chromatin is in a 
chromosome. 

19. The method of claim 12, wherein the first cleavage half domain is from 
20 a Type IIS restriction endonuclease. 

20. The method of claim 12, wherein the second cleavage half domain is 
from a Type IIS restriction endonuclease. 

25 21. A method for altering a first nucleotide sequence in a region of interest 

in cellular chromatin, the method comprising: 

(a) engineering a first zinc finger binding domain to bind to a second 
nucleotide sequence in the region of interest, wherein the second sequence comprises 
at least 9 nucleotides; 

30 (b) providing a second zinc finger binding domain to bind to a third nucleotide 

sequence, wherein the third sequence comprises at least 9 nucleotides and is located 
between 2 and 50 nucleotides from the second sequence; 
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(c) expressing a first fusion protein in the cell, the first fusion protein 
comprising the first zinc finger binding domain and a first cleavage half-domain; 

(d) expressing a second fusion protein in the cell, the second fusion protein 
comprising the second zinc finger binding domain and a second cleavage half domain; 

5 and 

(e) contacting the cell with a polynucleotide comprising a fourth nucleotide 
sequence, wherein the fourth nucleotide sequence is homologous but non-identical 
with the first nucleotide sequence; 

wherein binding of the first fusion protein to the second sequence, and binding 
10 of the second fusion protein to the third sequence, positions the cleavage half-domains 
such that the cellular chromatin is cleaved in the region of interest, thereby facilitating 
homologous recombination between the first nucleotide sequence and the fourth 
nucleotide sequence, resulting in alteration of the first nucleotide sequence. 

15 22. The method of claim 2 1 , wherein cleavage occurs between the second 

and third nucleotide sequences. 

23. The method of claim 21 wherein the second zinc finger binding 
domain is engineered to bind to the third nucleotide sequence. 

20 

24. The method of claim 21 wherein the first and second cleavage half- 
domains are from the same endonuclease. 

25. The method of claim 24 wherein the endonuclease is a Type IIS 
25 restriction endonuclease. 

26. The method of claim 25 wherein the Type IIS restriction endonuclease 
is Fok \. 

30 27. The method of claim 2 1 wherein the cellular chromatin is in a 

chromosome. 
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28. The method of claim 21 wherein the fourth nucleotide sequence 
contains sequences not present in the region of interest that are flanked by sequences 
homologous to the region of interest. 

5 29. The method of claim 21, wherein the first nucleotide sequence 

comprises a mutation in a gene. 

30. The method of claim 29, wherein the mutation is selected from the 
group consisting of at least one of a point mutation, a substitution, a deletion and an 

10 insertion. 

31. The method of claim 29 wherein the fourth nucleotide sequence 
comprises the wild-type sequence of the gene. 

1 5 32. The method of claim 29 wherein the cellular chromatin is cleaved at a 

site located within 100 nucleotides on either side of the mutation. 

33. The method of claim 21 wherein, in step (e), the polynucleotide is 

linear. 

20 

34. The method of claim 21 wherein, in step (e), the polynucleotide is 
circular. 

35. The method of claim 21 wherein, in step (e), the polynucleotide is 
25 double-stranded. 

36. The method of claim 21 wherein, in step (e), the polynucleotide is 
single-stranded. 

30 37. The method of claim 2 1 wherein, in step (e), the cell is contacted with 

a virus comprising the polynucleotide. 
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38. The method of claim 37, wherein the virus is an adenovirus or an 
adeno-associated virus. 

39. The method of claim 21 wherein, in steps (c) and (d), the cell is 

5 contacted with two polynucleotides, the first polynucleotide encoding the first fusion 
protein and the second polynucleotide encoding the second fusion protein. 

40. The method of claim 21 wherein, in steps (c) and (d), the cell is 
contacted with a polynucleotide encoding two fusion proteins, the first fusion protein 

1 0 comprising the first zinc finger binding domain and a first cleavage half-domain and 
the second fusion protein comprising the second zinc finger binding domain and a 
second cleavage half-domain. 

41. The method of claim 21 wherein the cell is arrested in the G2 phase of 
15 the cell cycle. 

42. The method of claim 21 wherein the near edges of the second and third 
nucleotide sequences are separated by 6 base pairs. 

20 43. The method of claim 42 wherein, in at least one of the fusion proteins, 

the amino acid sequence between the zinc finger binding domain and the cleavage 
half-domain (Z-C linker) consists of 4 amino acid residues. 

44. The method of claim 21 wherein at least one of the fusion proteins 

25 comprises an alteration in the amino acid sequence of the dimerization interface of the 
cleavage half-domain. 

45. The method of claim 2 1 wherein the second and third nucleotide 
sequences are present in the polynucleotide comprising the fourth nucleotide 

30 sequence. 

46. The method of claim 45, wherein the fourth polynucleotide sequence is 
cleaved. 
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47. The method of claim 21, wherein the first nucleotide sequence is 
converted to the fourth nucleotide sequence. 

5 48. A method for altering a first nucleotide sequence in a region of interest 

in cellular chromatin, the method comprising: 

(a) engineering a zinc finger binding domain to bind to a second nucleotide 
sequence in the region of interest, wherein the second sequence comprises at least 9 
nucleotides; 

10 (b) expressing a fusion protein in the cell, the fusion protein comprising the 

zinc finger binding domain and two cleavage half-domains; and 

(c) contacting the cell with a polynucleotide comprising a third nucleotide 
sequence, wherein the third nucleotide sequence is homologous but non-identical with 
the first nucleotide sequence; 

1 5 wherein the fusion protein binds to the second sequence and cleaves the 

cellular chromatin in the region of interest, thereby facilitating homologous 
recombination between the first nucleotide sequence and the third nucleotide 
sequence, resulting in alteration of the first nucleotide sequence. 

20 49. A method for altering a first nucleotide sequence in a region of interest 

in cellular chromatin, the method comprising: 

(a) engineering a first zinc finger binding domain to bind to a second 
nucleotide sequence in the region of interest, wherein the second sequence comprises 
at least 9 nucleotides; 

25 (b) providing a second zinc finger binding domain to bind to a third nucleotide 

sequence, wherein the third sequence comprises at least 9 nucleotides and is located 
between 2 and 50 nucleotides from the second sequence; 

(c) expressing a first fusion protein in the cell, the first fusion protein 
comprising the first zinc finger binding domain and a first cleavage half-domain; and 
30 (d) expressing a second fusion protein in the cell, the second fusion protein 

comprising the second zinc finger binding domain and a second cleavage half domain; 

wherein the first fusion protein binds to the second sequence, and the second 
fusion protein binds to the third sequence, thereby positioning the cleavage half- 
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domains such that the cellular chromatin is cleaved at a cleavage site in the first 
nucleotide sequence, and non-homologous end joining at the cleavage site results in 
alteration of the first nucleotide sequence. 

5 50. The method of claim 49, wherein cleavage occurs between the second 

and third nucleotide sequences. 

51. The method of claim 49 wherein the second zinc finger binding 
domain is engineered to bind to the third nucleotide sequence. 

10 

52. The method of claim 49 wherein the first and second cleavage half- 
domains are from the same endonuclease. 

53. The method of claim 52 wherein the endonuclease is a Type IIS 
1 5 restriction endonuclease. 

54. The method of claim 53 wherein the Type IIS restriction endonuclease 
is Fok I. 

20 " 55. The method of claim 49 wherein the cellular chromatin is in a 

chromosome. 

56. The method of claim 49,wherein the non-homologous end joining 
generates a sequence alteration in the cellular chromatin. 

25 

57. The method of claim 56, wherein the sequence alteration is selected 
from the group consisting of a point mutation, a substitution, a deletion and an 
insertion. 

30 58. The method of claim 47 wherein, in steps (c) and (d), the cell is 

contacted with two polynucleotides, the first polynucleotide encoding the first fusion 
protein and the second polynucleotide encoding the second fusion protein. 
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59. The method of claim 49 wherein, in steps (c) and (d), the cell is 
contacted with a polynucleotide encoding two fusion proteins, the first fusion protein 
comprising the first zinc finger binding domain and a first cleavage half-domain and 
the second fusion protein comprising the second zinc finger binding domain and a 

5 second cleavage half-domain. 

60. The method of claim 49 wherein the near edges of the second and third 
nucleotide sequences are separated by 6 base pairs. 

10 61 . The method of claim 60 wherein, in at least one of the fusion proteins, 

the amino acid sequence between the zinc finger binding domain and the cleavage 
half-domain (Z-C linker) consists of 4 amino acid residues. 

62. The method of claim 49 wherein at least one of the fusion proteins 
1 5 comprises an alteration in the amino acid sequence of the dimerization interface of the 
cleavage half-domain. 
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MetGlyPheLeuLysLeuIle (SEQ ID NO: 2) 
CTGCCGCCGGCGCCGCGGCCGTCATGGGGTTCCTGAAACTGATT 

GACGGCGGCCGCGGCGCCGGCAGTACCCCAAGGACTTTGACTAA (SEQ ID NO:l) 
I I I I I I I I 
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TAGTCCTGCAGGTTTAAACGAATTCGCCCTTC TCAGCAAGCGTGAGCTCA 5 0 

i o o 

i&ci&CTCCTC 150 

ATGCAACAAGAAAAGGGGGCGGAGGCACCACGCCAGTCGTCAGCTCGCTC 200 

CTCGTATACGCAACATCAGTCCCCGCCCCTGGTCCCACTCCTGCCGGAAG 2 50 

GCGAAGATCCCGTTAGGCCTGGACGTATTCTCGCGACATTTGCCGGTCGC 300 

CCGGCTTGCACTGCGGCGTTTCCCGCGCGGGCTACCTCAGTTCTCGGGCG 350 

TACGGCGCGGCCTGTCCTACTGCTGCCGGCGCCGCGGCCGTCAT aaqaaq 4 00 

cTTCCTGAAACTGATTG AAGGGCGAATTCGCGGCCGCTAAATTCAATTCG 4 50 
CCCTATAGTGAGT 

(SEQ ID NO: 6) 
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TyrLysAsnSerAspAsnAspLysVal (SEQ ID 

NO : 8 ) 

CTTCCAACCTTTCTCCTCTAGG TACAAGAACTCG GATAATGATAAAGTCC 
GAAGGTT GGAAAGAGG AGATCCATGTTCTTGAGCCTATTACTATTTCAGG (SEQ ID 

NO: 7) 

I I I I I I I I I I 
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GTTCCTCTTCCTTCCAACCTTTCTCCTCTAGGTACAAGAACTCGGATAATGATAAAGTC 
CAAGGAGAAGGAAGGTTGGAAAGAGGAGATCCATGTTCTTGAGCCTATTACTATTTCAG 
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TAGTCCTGCAGGTTTAAACGAATTCGCCCTT TCCTCTAGGTA aAAGAAtT 50 

CcGAcAAcGATAAAGTCCAGAAGTGCAGCCACTATCTATTCCCTGAAGAA 100 

ATCACTTCTGGCTGTCAGTTGCAAAAAAAGGAGATCCACCTCTACCAAAC 150 

ATTTGTTGTTC AGCTCCAGGACCCACGGGAACCCAGGAGACAGGCCACAC 200 

AGATGCTAAAACTGCAGAATCTGGGTAATTTGGAAAGAAAGGGTCAAGAG 250 

ACC AGGGATACTGTGGGACATTGGAGTCTAC AGAGTAGTGTTCTTTTATC 300 

ATAAGGGTACATGGGCAGAAAAGAGGAGGTAGGGGATCATGATGGGAAGG 350 

GAGGAGGTATTAGGGGCACTACCTTCAGGATCCTGACTTGTCTAGGCCAG 400 

GGGAATGACCACATATGCACACATATCTCCAGTGATCCCCTGGGCTCCAG 4 50 

AG AACCTAAC ACTTCAC 500 

^£^frM$?!?^ 5 5 0 

T ^GGGCGAAT 600 

ATTACAATTCACTGGCCGTCGTTT 



(SEQIDNO:12) 
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TACTGATGGTATGGGGCCAAGAGATATATCTTAGAGGGAGGGCTGAGGGT 50 

TTGAAGTCCAACTCCTAAGCCAGTGCCAGAAGAGCCAAGGACAGGTACGG 100 

CTGTCATC ACTTAGACCTCACCCTGTGGAGCCACACCCTAGGGTTGGCCA 150 

ATCT ACTCCC AGGAGCAGGGAGGGCAGGAGCCAGGGCTGGGCATAAAAGT 2 00 

CAGGGCAGAGCCATCTATTGCTT ACATTTGCTTCTGACACAACTGTGTTC 250 

ACTAGCAACCTCAAACAGACACCATGGTGCATCTGACTCCTGAGGAGAAG 300 

TCTG CCGTTACTGCCC TGTGGGGC AAGGTGAACGTG GATGAAGTTGGTGG 350 

TGAGGCCCTGGGCAGGTTGGTATCAAGGTT ACAAGACAGGTTTAAGGAGA 4 00 

CC AATAGAAACTGGGC ATGTGGAGACAGAGAAGACTCTTGGGTTTCTGAT 4 50 

AGGCACTGACTCTCTCTGCCTATTGGTCTATTTTCCCACCCTTAGGCTGC 500 

TGGTGGTCTACCCTTGGACCCAGAGGTTCTTTGAGTCCTTTGGGGATCTG 550 

TCCACTCCTG ATGCTGTTATGGGC AACCCT AAGGTG AAGGCTC ATGGCAA 600 

GAAAGTGCTCGGTGCCTTTAGTGATGGCCTGGCTCACCTGGACAACCTCA 650 

AGGGCACCTTTGCCACACTGAGTGAGCTGCACTGTGACAAGCTGCACGTG 700 



(SEQ ID NO: 13) 
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TGCTTACCAAGCTGTGATTCCAAATATT 50 

GGATGTTTTTACT 100 

TCTTAGAGGGAGGGC 150 

GAAGAGCC AAGGAC AGGTACGGCTGTCATCACTT AGACCTCACCCTGTGG 200 

AGCCACACCCTAGGGTTGGCCAATCTACTCCCAGGAGCAGGGAGGGCAGG 250 

AGCCAGGGCTGGGCATAAAAGTCAGGGCAGAGCCATCTATTGCTTACATT 300 

TGCTTCTGACACAACTGTGTTCACTAGCAACCTCAAACAGACACCATGGT 350 

GCATCTGACTCCTGAGGAGAAGTCTG qCGTTAgTGCCCqaa ttccqAt cG 400 
TcAACcac 



(SEQ ID NO: 14) 
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5-8 

CACGTTTCGTGTTCGGAGCCGCTTTAACCC ACTCTGTGGAAG 
GTGCAAAGCACAA GCCTCGGCGAAATTGGGTGAGACACCTTC 
5-10 



(SEQIDNO:15) 
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1 MAPKKKRKVG IHGVPAAMAE RPFQCRICMR 
61 ARNAHRINHT KIHTGSQKPF QCRICMRNFS 
121 STRTTHTKIH LRQKDAARGS QLVKSELEEK 
181 LEMKVMEFFM KVYGYRGKHL GGSRKPDGAI 
241 EMQRYVEENQ TRNKHINPNE WWKVYPSSVT 
301 AVLSVEELLI GGEMIKAGTL TLEEVRRKFN 



NFSRSDNLSE HIRTHTGEKP FACDICGRKF 

RSDTLSEHIR THTGEKPFAC DICGRKFAAR 

KSELRHKLKY VPHEYIELIE IARNSTQDRI 

YTVGSPIDYG VIVDTKAYSG GYNLPIGQAD 

EFKFLFVSGH FKGNYKAQLT RLNHITNCNG 
NGEINF 



(SEQIDNO:16) 
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1 MAPKKKRKVG IHGVPAA MAE RPFQCRICMR 

61 ADSSNRKTHT KIHTGGGGSQ KPFQCRICMR 

121 ADRSNRITHT KIHLRQKDAA RGSQLVKSEL 

181 DRILEMKVME FFMKVYGYRG KHLGGSRKPD 

241 QADEMQRYVE ENQTRNKHIN PNEWWKVYPS 

301 CNGAVLSVEE LLIGGEMIKA GTLTLEEVRR 



(SEQIDNO:17) 



NFSRSDSLSR HIRTHTGEKP FACDICGRKF 
NFSRSDSLSV HIRTHTGEKP FACDICGRKF 
EEKKSELRHK LKYVPHEYIE LIEIARNSTQ 
GAIYTVGSPI DYGVIVDTKA YSGGYNLPIG 
SVTEFKFLFV SGHFKGNYKA QLTRLNHITN 
KFNNGEINF 
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CGAATTCTGCAGTCGACGGTACCGCGGGCCCGGGATCCACCGGTCGCCACCATGGTGAGC 
7VAGGGCGAGGAGCTGTTCACCGGGGTGGTGCCCATCCTGGTCGAGCTGGACGGCGACGTA 
AACGGCCACAAGTTCAGCGTGTCCGGCGAGGGCGAGGGCGATGCCACCTACGGCAAGCTG 
ACCCTGAAGTTCATCTGCACCACCGGCAAGCTGCCCGTGCCCTGGCCCACCCTCGTGACC 
ACCCTGACCTACGGCGTGCAGTGCTTCAGCCGCTACCCC GACCACAT GAAGCAGCACGAC 
TTCTTCAAGTCCGCCATGCCCGAAGGCTACGTCCAGGAGCGCACCATCTTCTTCAAGGAC 
GACGGCAACTACAAGACCCGCGCCGAGGTGAAGTTCGAGGGCGACACCCTGGTGAACCGC 
ATCGAGCTGAAGGGCATCGACTTCAAGGAGGACGGCAACATCCTGGGGCACAAGCTGGAG 
TACAACTACAACAGCCACAACGTCTATATCATGGCCGACAAGCAGAAGAACGGCATCAAG 
GTGAACTTCAAGATCCGCCACAACATCGAGGACGGCAGCGTGCAGCTCGCCGACCACTAC 
CAGCAGAACACCCCCATCGGCGACGGCCCCGTGCTGCTGCCCGACAACCACTACCTGAGC 
ACCCAGTCCGCCCTGAGCAAAGACCCCAACGAGAAGCGCGATCACATGGTCCTGCTGGAG 
TTCGTGACCGCCGCCGGGATCACTCTCGGCATGGACGAGCTGTACAAGTAAAGCGGCCGC 
GACTCTAGATCATAATC 



(SEQ ID NO: 18) 
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CGAATTCTGCAGTCGACGGTACCGCGGGCCCGGGATCCACCGGTCGCCACCATGGTGAGC 
AAGGGCGAGGAGCTGTTCACCGGGGTGGTGCCCATCCTGGTCGAGCTGGACGGCGACGTA 
AACGGCCACAAGTTCAGCGTGTCCGGCGAGGGCGAGGGCGATGCCACCTACGGCAAGCTG 
ACCCTGAAGTTCATCTGCACCACCGGCAAGCTGCCCGTGCCCTGGCCCACCCTCGTGACC 
ACCCTGACCTACGGCGTGCAGTGCTTCAGC CGCTACCCC TAACAC GAAGCAGCA CGACTT 
CTTCAAGTCCGCCATGCCCGAAGGCTACGTCCAGGAGCGCACCATCTTCTTCAAGGACGA 
CGGCAACTACAAGACCCGCGCCGAGGTGAAGTTCGAGGGCGACACCCTGGTGAACCGCAT 
CGAGCTGAAGGGCATCGACTTCAAGGAGGACGGCAACATCCTGGGGCACAAGCTGGAGTA 
CAACTACAACAGCCACAACGTCTATATCATGGCCGACAAGCAGAAGAACGGCATCAAGGT 
GAACTTCAAGATCCGCCACAACATCGAGGACGGCAGCGTGCAGCTCGCCGACCACTACCA 
GCAGAACACCCCCATCGGCGACGGCCCCGTGCTGCTGCCCGACAACCACTACCTGAGCAC 
CCAGTCCGCCCTGAGCAAAGACCCCAACGAGAAGCGCGATCACATGGTCCTGCTGGAGTT 
CGTGACCGCCGCCGGGATCACTCTCGGCATGGACGAGCTGTACAAGTAAAGCGGCCGCGA 
CTCTAGATCATAATC 



(SEQ ID NO: 19) 
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GGCGAGGAGCTGTTCACCGGGGTGGTGCCCATCCTGGTCGAGCTGGACGGCGACGTAAAC 
GGCCACAAGTTCAGCGTGTCCGGCGAGGGCGAGGGCGATGCCACCTACGGCAAGCTGACC 
CTGAAGTTCATCTGCACCACCGGCAAGCTGCCCGTGCCCTGGCCCACCCTCGTGACCACC 
CTGACCTACGGCGTGCAGTGCTTCAGCCGCTACCCCGACCACATGAAGCAGCACGACTTC 
TTCAAGTCCGCCATGCCCGAAGGCTACGTCCAGGAGCGCACCATCTTCTTCAAGGACGAC 
GGCAACTACAAGACCCGCGCCGAGGTGAAGTTCGAGGGCGACACCCTGGTGAACCGCATC 
GAGCTGAAGGGCATCGACTTCAAGGAGGACGGCAACATCCTGGGGCACAAGCTGGAGTAC 
AACTACAAdAGCCACAACGTCTATATCATGGCCGACAAGCAGAAGAACGGCATCAAGGTG 
AACTTCAAGATCCGCCACAACATCGAGGACGGCAGCGTGCAGCTCGCCGACCACTACCAG 
CAGAACACCCCCATCGGCGACGGCCCCGTGCTGCTGCCCGACAACCACTACCTGAGCACC 
CAGTCCGCCCTGAGCAAAGACCCCAACGAGAAGCGCGATCACATGGTCCTGCTGGAGTTC 
GTGACCGCCGCCGGGATCACTCTCGGCATGGACGAGCTGTACAAGTAAAGCGGCCGCGAC 
TCTAGATCATAATC 
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FIGURE 32 



GGCGAGGAGCTGTTCACCGGGGTGGTGCCCATCCTGGTCGAGCTGGACGGCGACGTAAAC 
GGCCACAAGTTCAGCGTGTCCGGCGAGGGCGAGGGCGATGCCACCTACGGCAAGCTGACC 
CTGAAGTTCATCTGCACCACCGGCAAGCTGCCCGTGCCCTGGCCCACCCTCGTGACCACC 
CTGACCTACGGCGTGCAGTGCTTCAGCCGCTACCCCGACCACATGAAGCAGCACGACTTC 
TTCAAGTCCGCCATGCCCGAAGGCTACGTCCAGGAGCGCACCATCTTCTTCAAGGACGAC 
GGCAACTACAAGACCCGCGCCGAGGTGAAGTTCGAGGGCGACACCCTGGTGAACCGCATC 
GAGCTGAAGGGCATCGACTTCAAGGAGGACGGCAACATCCTGGGGCACAAGCTGGAGTAC 
AACTACAACAGCCACAACGTCTATATCATGGCCGACAAGCAGAAGAACGGCATCAAGGTG 
AACTTCAAGATCCGCCACAACATCGAGGACGGCAGCGTGCAGCTCGCCGACCACTACCAG 
CAGAACACCCCCATCGGCGACGGCCCCGTGCTGCTGCCCGACAACCACTACCTGAGCACC 
CAGTCCGCCCTGAGCAAAGACCCCAACGAGAAGCGCGATCACATGGTCCTGCTGGAGTTC 
GTGACCGCCGCCGGGATCACTCTCGGCATGGACGAGCTGTACAAGTAAAGCGGCCGCTCG 
AGTCTAGAGGGCCCGTTTA7^ACCCGCTGATCAGCCTCGACTGTGCCTTCTAGTTGCCAGC 
CATCTGTTGTTTGCCCCTCCCCCGTGCCTTCCTTGACCCTGGAAGGTGCCACTCCCACTG 
TCCTTTCCTAATAAAATGAGGAAATTGCATCGCATTGTCTGAGTAGGTGTCATTCTATTC 
TGGGGGGTGGGGTGGGGCAGGACAGCAAGGGGGAGGATTGGGAAGACAATAGCAGGCATG 
CTGGGGATGCGGTGGGCTCTATGGCTTCTGAGGCGGAAAGAACCAGCTGGGGCTCTAGGG 
GGTATCCCCACGCGCCCTGTAGCGGCGCATTAAGCGCGGCGGGTGTGGTGGTTACGCGCA 
GCGTGACCGCTACACTTGCCAGCGCCCTAGCGCCCGCTCCTTTCGCTTTCTTCCCTTCCT 
TTCTCGCCACGTTCGCCGGCTTTCCCCGTCAAGCTCTAAATCGGGGGCTCCCTTTAGGGT 
TCCGATTTAGTGCTTTACGGCACCTCGACCCCAAAAAACTTGATTAGGGTGATGGTTCAC 
GTAGTGGGCCATCGCCCTGATAGACGGTTTTTCGCCCTTTGACGTTGGAGTCCACGTTCT 
TTAATAGTGGACTCTTGTTCCAAACTGGAACAACACTCAACCCTATCTCGGTCTATTCTT 
TTGATTTATAAGGGATTTTGCCGATTTCGGCCTATTGGTTAAAAAATGAGCTGATTTAAC 
AAAAATTTAACGCGAATTAATTCTGTGGAATGTGTGTCAGTTAGGGTGTGGAAAGTCCCC 
AGGCTCCCCAGCAGGCAGAAGTATGCA 
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sca-29b: 

MAERPFQCRICMRNFS QSGDLTR HIRTHTGEKPFACDICGRKF ATSANLSR HTK 
IHTGGGGSQKPFQCRICMRNFSDRSALSRHIRTHTGEKPFACDICGRKFAQSG 
HLSR HTKIH (SEQ ID NO:22) 



sca-36a: 

MAERPFQCRICMRNFS RSQTRKT HIRTHTGEKPFACDICGRKF AQKRNRTK HT 
KIHTGSQKPFQCRICMRNFS DRSALSR HIRTHTGEKPFACDICGRKF AQSGNLA 
RHTKIH (SEQ ID NO:23) 



sca-36b: 

MAERPFQCRICMRNFS TSGSLSR HIRTHTGEKPFACD1CGRKF ADRSDLSR HTK 
IHTGGGGSQKPFQCRICMRNFSDRSALSRHIRTHTGEKPFACDICGRKFAQSG 
NLAR HTKIH (SEQ ID NO:24) 



sca-36c: 

MAERPFQCRICMRNFS TSSSLSR HIRTHTGEKPFACDICGRKF ADRSDLSR HTK 
IHTGGGGSQKPFQCRICMRNFS DRSALSR HIRTHTGEKPFACDICGRKF AQSG 
NLAR HTKIH (SEQ ID NO:25) 
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